vignettes/signac-Seurat_pbmcs.Rmd
signac-Seurat_pbmcs.Rmd
This vignette shows how to use Signac with Seurat. There are three parts: Seurat, Signac and then visualization. We use an example PBMCs scRNA-seq data set from 10X Genomics.
Start with the standard pre-processing steps for a Seurat object.
Download data from 10X Genomics.
dir.create("fls")
download.file("https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.h5",
destfile = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")
Create a Seurat object, and then perform SCTransform normalization. Note:
# load data
E = Read10X_h5(filename = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")
pbmc <- CreateSeuratObject(counts = E, project = "pbmc")
# run sctransform
pbmc <- SCTransform(pbmc, verbose = FALSE)
Perform dimensionality reduction by PCA and UMAP embedding. Note:
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)
First, make sure you have the Signac package installed.
install.packages("SignacX")
Load the library
Generate SignacX labels for the Seurat object. Note:
# Run Signac
labels <- Signac(pbmc, num.cores = 4)
celltypes = GenerateLabels(labels, E = pbmc)
Sometimes, training the neural networks takes a lot of time. To make Signac faster, we implemented SignacFast which uses an ensemble of pre-trained neural network models. Note:
# Run Signac
labels_fast <- SignacFast(pbmc)
celltypes_fast = GenerateLabels(labels_fast, E = pbmc)
SignacFast took only ~30 seconds. Relative to Signac, the main difference is that SignacFast tends to leave a few more cells “Unclassified.”
B | MPh | TNK | Unclassified | |
---|---|---|---|---|
B | 186 | 0 | 0 | 0 |
MPh | 0 | 362 | 0 | 54 |
TNK | 0 | 0 | 573 | 3 |
Unclassified | 0 | 0 | 0 | 44 |
Now we can visualize the cell type classifications at many different levels: Immune and nonimmune
pbmc <- AddMetaData(pbmc, metadata = celltypes$Immune, col.name = "immmune")
pbmc <- SetIdent(pbmc, value = "immmune")
png(filename = "fls/plot1.png")
DimPlot(pbmc)
dev.off()
pbmc <- AddMetaData(pbmc, metadata = celltypes$L2, col.name = "L2")
pbmc <- SetIdent(pbmc, value = "L2")
png(filename = "fls/plot2.png")
DimPlot(pbmc)
dev.off()
lbls = factor(celltypes$CellTypes)
levels(lbls) <- sort(unique(lbls))
pbmc <- AddMetaData(pbmc, metadata = lbls, col.name = "celltypes")
pbmc <- SetIdent(pbmc, value = "celltypes")
png(filename = "./fls/plot3.png")
DimPlot(pbmc)
dev.off()
pbmc <- AddMetaData(pbmc, metadata = celltypes$CellTypes_novel, col.name = "celltypes_novel")
pbmc <- SetIdent(pbmc, value = "celltypes_novel")
png(filename = "./fls/plot4.png")
DimPlot(pbmc)
dev.off()
pbmc <- AddMetaData(pbmc, metadata = celltypes$CellStates, col.name = "cellstates")
pbmc <- SetIdent(pbmc, value = "cellstates")
png(filename = "./fls/plot5.png")
DimPlot(pbmc)
dev.off()
Identify differentially expressed genes between cell types. Here, we see that Signac identified two novel cell populations that are positive for platelet and plasma cell markers, respectively.
pbmc <- SetIdent(pbmc, value = "celltypes_novel")
# Find protein markers for all clusters, and draw a heatmap
markers <- FindAllMarkers(pbmc, only.pos = TRUE, verbose = F, logfc.threshold = 1)
library(dplyr)
top5 <- markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_logFC)
png(filename = "./fls/plot6.png")
DoHeatmap(pbmc, features = unique(top5$gene), angle = 90)
dev.off()
Save results
saveRDS(pbmc, file = "fls/pbmcs_signac.rds")
saveRDS(celltypes, file = "fls/celltypes.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast.rds")
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] knitr_1.31 magrittr_2.0.1 R6_2.5.0 ragg_1.1.1
## [5] rlang_0.4.10 fastmap_1.1.0 highr_0.8 stringr_1.4.0
## [9] tools_4.0.0 xfun_0.21 jquerylib_0.1.3 htmltools_0.5.1.1
## [13] systemfonts_1.0.1 yaml_2.2.1 assertthat_0.2.1 digest_0.6.27
## [17] rprojroot_2.0.2 pkgdown_1.6.1 crayon_1.4.1 textshaping_0.3.1
## [21] formatR_1.7 sass_0.3.1 fs_1.5.0 memoise_2.0.0
## [25] cachem_1.0.3 evaluate_0.14 rmarkdown_2.7 stringi_1.5.3
## [29] compiler_4.0.0 bslib_0.2.4 desc_1.2.0 jsonlite_1.7.2