vignettes/SignacFast-Seurat_AMP_RA.Rmd
SignacFast-Seurat_AMP_RA.Rmd
Sometimes we don’t have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data. Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classification time ~5-10x fold. These models were generated from the HPCA training data like so:
# load pre-trained neural network ensemble model
ref = GetTrainingData_HPCA()
# generate models
Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)
The “Models_HPCA” are accessed from within the R package:
# load pre-trained neural network ensemble model
Models = GetModels_HPCA()
We demonstrate how to use SignacFast in this vignette, which shows that SignacFast is broadly consistent with Signac (just faster). Here, we show how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.
Read the CEL-seq2 data.
ReadCelseq <- function(counts.file, meta.file) {
E = suppressWarnings(readr::read_tsv(counts.file))
gns <- E$gene
E = E[, -1]
E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
rownames(E) <- gns
E
}
counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"
E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))
# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]
# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]
Start with the standard pre-processing steps for a Seurat object.
Create a Seurat object, and then perform SCTransform normalization. Note:
# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")
# run sctransform
synovium <- SCTransform(synovium)
Perform dimensionality reduction by PCA and UMAP embedding. Note:
# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)
Generate Signac labels for the Seurat object. Note:
labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)
Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:
# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)
Compare results:
Celltypes:B | MPh | NonImmune | Plasma.cells | TNK | Unclassified | |
---|---|---|---|---|---|---|
B | 681 | 0 | 0 | 0 | 0 | 0 |
MPh | 0 | 835 | 0 | 0 | 0 | 68 |
NonImmune | 0 | 0 | 2487 | 0 | 0 | 0 |
Plasma.cells | 0 | 0 | 0 | 263 | 0 | 6 |
TNK | 0 | 0 | 0 | 0 | 1768 | 0 |
Unclassified | 0 | 13 | 7 | 0 | 0 | 174 |
B.memory | B.naive | Fibroblasts | Macrophages | Mon.Classical | NonImmune | Plasma.cells | T.CD4.memory | T.CD4.naive | T.CD8.em | T.CD8.naive | T.regs | Unclassified | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B.memory | 489 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
B.naive | 4 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
DC | 0 | 0 | 0 | 4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 |
Fibroblasts | 0 | 0 | 2110 | 0 | 0 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Macrophages | 0 | 0 | 0 | 662 | 33 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 73 |
Mon.Classical | 0 | 0 | 0 | 23 | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
NonImmune | 0 | 0 | 74 | 0 | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
Plasma.cells | 0 | 0 | 0 | 0 | 0 | 1 | 259 | 0 | 0 | 0 | 0 | 0 | 8 |
T.CD4.memory | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 504 | 112 | 17 | 29 | 15 | 0 |
T.CD4.naive | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 309 | 4 | 18 | 0 | 1 |
T.CD8.em | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 4 | 574 | 1 | 0 | 2 |
T.CD8.naive | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 26 | 2 | 0 |
T.regs | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 1 | 2 | 106 | 0 |
Unclassified | 0 | 0 | 1 | 12 | 3 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | 179 |
Save results
saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] knitr_1.31 magrittr_2.0.1 R6_2.5.0 ragg_1.1.1
## [5] rlang_0.4.10 fastmap_1.1.0 highr_0.8 stringr_1.4.0
## [9] tools_4.0.0 xfun_0.21 jquerylib_0.1.3 htmltools_0.5.1.1
## [13] systemfonts_1.0.1 yaml_2.2.1 assertthat_0.2.1 digest_0.6.27
## [17] rprojroot_2.0.2 pkgdown_1.6.1 crayon_1.4.1 textshaping_0.3.1
## [21] formatR_1.7 sass_0.3.1 fs_1.5.0 memoise_2.0.0
## [25] cachem_1.0.3 evaluate_0.14 rmarkdown_2.7 stringi_1.5.3
## [29] compiler_4.0.0 bslib_0.2.4 desc_1.2.0 jsonlite_1.7.2