In Figure 4 of the pre-print, we demonstrated that SignacX mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. In this vignette, we reproduced this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model in the SignacX data portal (Note: the CD56bright NK cells appear in the “CellStates” annotation layer as red cells).

This vignette shows how to use SignacX with Seurat and SPRING to learn a new cell type category from single cell data.

Load data

We start with CITE-seq data that were already classified with SignacX using the SPRING pipeline.

Load CITE-seq data from 10X Genomics processed with SPRING and classified with SignacX already.

# load CITE-seq data
data.dir = './CITESEQ_EXPLORATORY_CITESEQ_5K_PBMCS/FullDataset_v1_protein'
E = CID.LoadData(data.dir = data.dir)

# Load labels
json_data = rjson::fromJSON(file=paste0(data.dir,'/categorical_coloring_data.json'))

Create a Seurat object for the protein expression data; we will use this as a reference.

# separate protein and gene expression data
logik = grepl("Total", rownames(E))
P = E[logik,]
E = E[!logik,]

# CLR normalization in Seurat
colnames(P) <- 1:ncol(P)
colnames(E) <- 1:ncol(E)
reference <- CreateSeuratObject(E)
reference[["ADT"]] <- CreateAssayObject(counts = P)
reference <- NormalizeData(reference, assay = "ADT", normalization.method = "CLR")

Identify CD56 bright NK cells based on protein expression data.

# generate labels 
lbls = json_data$CellStates$label_list
lbls[lbls != "NK"] = "Unclassified"
CD16 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD16-TotalSeqB-CD16",]
CD56 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD56-TotalSeqB-CD56",]
logik = log2(CD56) > 10 & log2(CD16) < 7.5 & lbls == "NK"; sum(logik)
lbls[logik] = "NK.CD56bright"

SignacX

Generate a training data set from the reference data and save it for later use. Note:

  • SignacBoot performs feature selection, bootstrapping, imputation and normalization to derive a training data set from single cell data.
# generate bootstrapped single cell data
R_learned = SignacBoot(E = E, spring.dir = data.dir, L = c("NK", "NK.CD56bright"), labels = lbls, logfc.threshold = 1)

# save the training data
save(R_learned, file = "training_NKBright_v207.rda")

Classify a new data set with the model

Load expression data for a different data set (this was also previously processed through SPRING and SignacX)

# Classify another data set with new model
# load new data
new.data.dir = "./PBMCs_5k_10X/FullDataset_v1"
E = CID.LoadData(data.dir = new.data.dir)
# load cell types identified with Signac
json_data = rjson::fromJSON(file=paste0(new.data.dir,'/categorical_coloring_data.json'))

Generate new labels. Note:

  • Signac trains an ensemble of 100 neural network classifiers using the new training data set built above (R_learned), and then classifies unseen data (E).
# generate new labels
cr_learned = Signac(E = E, R = R_learned, spring.dir = new.data.dir)

Now we amend the existing labels (classified previously with SignacX); we add the new labels and generate a new SPRING layout.Note:

  • We usually copy the existing SPRING files from “FullDataset_v1” to “FullDataset_v1_Learned” to generate a new layout while preserving the existing layout.
# modify the existing labels
cr = lapply(json_data, function(x) x$label_list)
logik = cr$CellStates == 'NK'
cr$CellStates[logik] = cr_learned[logik]
logik = cr$CellStates_novel == 'NK'
cr$CellStates_novel[logik] = cr_learned[logik]
new.data.dir = paste0(new.data.dir, "_Learned")

Save results

# save
dat = CID.writeJSON(cr, spring.dir = new.data.dir, new_colors = c('red'), new_populations = c( 'NK.CD56bright'))
Session Info
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] rprojroot_2.0.2   crayon_1.4.1      digest_0.6.27     assertthat_0.2.1 
##  [5] R6_2.5.0          jsonlite_1.7.2    magrittr_2.0.1    evaluate_0.14    
##  [9] stringi_1.5.3     rlang_0.4.10      cachem_1.0.3      fs_1.5.0         
## [13] jquerylib_0.1.3   bslib_0.2.4       ragg_1.1.1        rmarkdown_2.7    
## [17] pkgdown_1.6.1     textshaping_0.3.1 desc_1.2.0        tools_4.0.0      
## [21] stringr_1.4.0     yaml_2.2.1        xfun_0.21         fastmap_1.1.0    
## [25] compiler_4.0.0    systemfonts_1.0.1 memoise_2.0.0     htmltools_0.5.1.1
## [29] knitr_1.31        sass_0.3.1