vignettes/signac-SPRING_Learning.Rmd
signac-SPRING_Learning.Rmd
In Figure 4 of the pre-print, we demonstrated that SignacX mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. In this vignette, we reproduced this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model in the SignacX data portal (Note: the CD56bright NK cells appear in the “CellStates” annotation layer as red cells).
This vignette shows how to use SignacX with Seurat and SPRING to learn a new cell type category from single cell data.
We start with CITE-seq data that were already classified with SignacX using the SPRING pipeline.
Load CITE-seq data from 10X Genomics processed with SPRING and classified with SignacX already.
# load CITE-seq data
data.dir = './CITESEQ_EXPLORATORY_CITESEQ_5K_PBMCS/FullDataset_v1_protein'
E = CID.LoadData(data.dir = data.dir)
# Load labels
json_data = rjson::fromJSON(file=paste0(data.dir,'/categorical_coloring_data.json'))
Create a Seurat object for the protein expression data; we will use this as a reference.
# separate protein and gene expression data
logik = grepl("Total", rownames(E))
P = E[logik,]
E = E[!logik,]
# CLR normalization in Seurat
colnames(P) <- 1:ncol(P)
colnames(E) <- 1:ncol(E)
reference <- CreateSeuratObject(E)
reference[["ADT"]] <- CreateAssayObject(counts = P)
reference <- NormalizeData(reference, assay = "ADT", normalization.method = "CLR")
Identify CD56 bright NK cells based on protein expression data.
# generate labels
lbls = json_data$CellStates$label_list
lbls[lbls != "NK"] = "Unclassified"
CD16 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD16-TotalSeqB-CD16",]
CD56 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD56-TotalSeqB-CD56",]
logik = log2(CD56) > 10 & log2(CD16) < 7.5 & lbls == "NK"; sum(logik)
lbls[logik] = "NK.CD56bright"
Generate a training data set from the reference data and save it for later use. Note:
# generate bootstrapped single cell data
R_learned = SignacBoot(E = E, spring.dir = data.dir, L = c("NK", "NK.CD56bright"), labels = lbls, logfc.threshold = 1)
# save the training data
save(R_learned, file = "training_NKBright_v207.rda")
Load expression data for a different data set (this was also previously processed through SPRING and SignacX)
# Classify another data set with new model
# load new data
new.data.dir = "./PBMCs_5k_10X/FullDataset_v1"
E = CID.LoadData(data.dir = new.data.dir)
# load cell types identified with Signac
json_data = rjson::fromJSON(file=paste0(new.data.dir,'/categorical_coloring_data.json'))
Generate new labels. Note:
# generate new labels
cr_learned = Signac(E = E, R = R_learned, spring.dir = new.data.dir)
Now we amend the existing labels (classified previously with SignacX); we add the new labels and generate a new SPRING layout.Note:
# modify the existing labels
cr = lapply(json_data, function(x) x$label_list)
logik = cr$CellStates == 'NK'
cr$CellStates[logik] = cr_learned[logik]
logik = cr$CellStates_novel == 'NK'
cr$CellStates_novel[logik] = cr_learned[logik]
new.data.dir = paste0(new.data.dir, "_Learned")
Save results
# save
dat = CID.writeJSON(cr, spring.dir = new.data.dir, new_colors = c('red'), new_populations = c( 'NK.CD56bright'))
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] rprojroot_2.0.2 crayon_1.4.1 digest_0.6.27 assertthat_0.2.1
## [5] R6_2.5.0 jsonlite_1.7.2 magrittr_2.0.1 evaluate_0.14
## [9] stringi_1.5.3 rlang_0.4.10 cachem_1.0.3 fs_1.5.0
## [13] jquerylib_0.1.3 bslib_0.2.4 ragg_1.1.1 rmarkdown_2.7
## [17] pkgdown_1.6.1 textshaping_0.3.1 desc_1.2.0 tools_4.0.0
## [21] stringr_1.4.0 yaml_2.2.1 xfun_0.21 fastmap_1.1.0
## [25] compiler_4.0.0 systemfonts_1.0.1 memoise_2.0.0 htmltools_0.5.1.1
## [29] knitr_1.31 sass_0.3.1