Sometimes we don’t have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data. Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classification time ~5-10x fold. These models were generated from the HPCA training data like so:

# load pre-trained neural network ensemble model
ref = GetTrainingData_HPCA()

# generate models
Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)

The “Models_HPCA” are accessed from within the R package:

# load pre-trained neural network ensemble model
Models = GetModels_HPCA()

We demonstrate how to use SignacFast in this vignette, which shows that SignacFast is broadly consistent with Signac (just faster). Here, we show how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.

Load data

Read the CEL-seq2 data.

ReadCelseq <- function(counts.file, meta.file) {
    E = suppressWarnings(readr::read_tsv(counts.file))
    gns <- E$gene
    E = E[, -1]
    E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
    rownames(E) <- gns
    E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]

Seurat

Start with the standard pre-processing steps for a Seurat object.

Create a Seurat object, and then perform SCTransform normalization. Note:

  • You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods.
  • We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it.
# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

  • Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat.
# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)

SignacX

Generate Signac labels for the Seurat object. Note:

  • Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function.
labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)

Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:

  • SignacFast uses an ensemble of 1,800 pre-calculated neural networks using the GenerateModels function together with the training_HPCA reference data set.
  • Features that are absent from the single cell data and present in the neural network are set to zero.
# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)

Compare results:

Celltypes:
B MPh NonImmune Plasma.cells TNK Unclassified
B 681 0 0 0 0 0
MPh 0 835 0 0 0 68
NonImmune 0 0 2487 0 0 0
Plasma.cells 0 0 0 263 0 6
TNK 0 0 0 0 1768 0
Unclassified 0 13 7 0 0 174
Cellstates:
B.memory B.naive Fibroblasts Macrophages Mon.Classical NonImmune Plasma.cells T.CD4.memory T.CD4.naive T.CD8.em T.CD8.naive T.regs Unclassified
B.memory 489 4 0 0 0 0 0 0 0 0 0 0 1
B.naive 4 184 0 0 0 0 0 0 0 0 0 0 0
DC 0 0 0 4 3 0 0 0 0 0 0 0 5
Fibroblasts 0 0 2110 0 0 136 0 0 0 0 0 0 0
Macrophages 0 0 0 662 33 1 0 0 0 0 0 0 73
Mon.Classical 0 0 0 23 93 0 0 0 0 0 0 0 1
NonImmune 0 0 74 0 0 166 0 0 0 0 0 0 2
Plasma.cells 0 0 0 0 0 1 259 0 0 0 0 0 8
T.CD4.memory 0 0 0 0 0 0 0 504 112 17 29 15 0
T.CD4.naive 0 0 0 0 0 0 0 0 309 4 18 0 1
T.CD8.em 0 0 0 0 1 0 0 7 4 574 1 0 2
T.CD8.naive 0 0 0 0 0 0 0 2 1 0 26 2 0
T.regs 0 0 0 0 0 0 0 0 27 1 2 106 0
Unclassified 0 0 1 12 3 5 0 0 0 1 0 0 179

Save results

saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
Session Info
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.31        magrittr_2.0.1    R6_2.5.0          ragg_1.1.1       
##  [5] rlang_0.4.10      fastmap_1.1.0     highr_0.8         stringr_1.4.0    
##  [9] tools_4.0.0       xfun_0.21         jquerylib_0.1.3   htmltools_0.5.1.1
## [13] systemfonts_1.0.1 yaml_2.2.1        assertthat_0.2.1  digest_0.6.27    
## [17] rprojroot_2.0.2   pkgdown_1.6.1     crayon_1.4.1      textshaping_0.3.1
## [21] formatR_1.7       sass_0.3.1        fs_1.5.0          memoise_2.0.0    
## [25] cachem_1.0.3      evaluate_0.14     rmarkdown_2.7     stringi_1.5.3    
## [29] compiler_4.0.0    bslib_0.2.4       desc_1.2.0        jsonlite_1.7.2