KSoftImpute is an ultra-fast method for imputing missing gene expression values in single cell data. KSoftImpute uses k-nearest neighbors to impute the expression of each gene by the weighted average of itself and it's first-degree neighbors. Weights for imputation are determined by the number of detected genes. This method works for large data sets (>100,000 cells) in under a minute.

KSoftImpute(E, dM = NULL, genes.to.use = NULL, verbose = FALSE)

Arguments

E

A gene-by-sample count matrix (sparse matrix or matrix) with genes identified by their HUGO symbols.

dM

see ?CID.GetDistMat

genes.to.use

a character vector of genes to impute. Default is NULL.

verbose

If TRUE, code reports outputs. Default is FALSE.

Value

An expression matrix (sparse matrix) with imputed values.

See also

Examples

if (FALSE) { # download single cell data for classification file.dir = "https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/" file = "pbmc_1k_v3_filtered_feature_bc_matrix.h5" download.file(paste0(file.dir, file), "Ex.h5") # load data, process with Seurat library(Seurat) E = Read10X_h5(filename = "Ex.h5") pbmc <- CreateSeuratObject(counts = E, project = "pbmc") # run Seurat pipeline pbmc <- SCTransform(pbmc, verbose = FALSE) pbmc <- RunPCA(pbmc, verbose = FALSE) pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE) pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE) # get edges from default assay from Seurat object default.assay <- Seurat::DefaultAssay(pbmc) edges = pbmc@graphs[[which(grepl(paste0(default.assay, "_nn"), names(pbmc@graphs)))]] # get distance matrix dM = CID.GetDistMat(edges) # run imputation Z = KSoftImpute(E = E, dM = dM, verbose = TRUE) }