KSoftImpute
is an ultra-fast method for imputing missing gene expression values in single cell data.
KSoftImpute
uses k-nearest neighbors to impute the expression of each gene by the weighted average of itself
and it's first-degree neighbors. Weights for imputation are determined by the number of detected genes. This method
works for large data sets (>100,000 cells) in under a minute.
KSoftImpute(E, dM = NULL, genes.to.use = NULL, verbose = FALSE)
E | A gene-by-sample count matrix (sparse matrix or matrix) with genes identified by their HUGO symbols. |
---|---|
dM | see ?CID.GetDistMat |
genes.to.use | a character vector of genes to impute. Default is NULL. |
verbose | If TRUE, code reports outputs. Default is FALSE. |
An expression matrix (sparse matrix) with imputed values.
Signac
and SignacFast
if (FALSE) { # download single cell data for classification file.dir = "https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/" file = "pbmc_1k_v3_filtered_feature_bc_matrix.h5" download.file(paste0(file.dir, file), "Ex.h5") # load data, process with Seurat library(Seurat) E = Read10X_h5(filename = "Ex.h5") pbmc <- CreateSeuratObject(counts = E, project = "pbmc") # run Seurat pipeline pbmc <- SCTransform(pbmc, verbose = FALSE) pbmc <- RunPCA(pbmc, verbose = FALSE) pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE) pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE) # get edges from default assay from Seurat object default.assay <- Seurat::DefaultAssay(pbmc) edges = pbmc@graphs[[which(grepl(paste0(default.assay, "_nn"), names(pbmc@graphs)))]] # get distance matrix dM = CID.GetDistMat(edges) # run imputation Z = KSoftImpute(E = E, dM = dM, verbose = TRUE) }