Skip to contents

This function is used to remove the cells and features of low quality. It provides two modes to define the cutoffs:

  • "auto": in this mode, scMINER will estimate the cutoffs based on Median ± 3*MAD (maximum absolute deviation). This mode works well for the matrix of raw UMI counts or TPM (Transcripts Per Million) values.

  • "manual": in this mode, the users can manually specify the cutoffs, both low and high, of all 5 metrics: nUMI, nFeature, pctMito, pctSpikeIn for cells, and nCell for genes. No cells or features would be removed under the default cutoffs of each metrics.

Usage

filterSparseEset(
  input_eset,
  filter_mode = "auto",
  filter_type = "both",
  gene.nCell_min = NULL,
  gene.nCell_max = NULL,
  cell.nUMI_min = NULL,
  cell.nUMI_max = NULL,
  cell.nFeature_min = NULL,
  cell.nFeature_max = NULL,
  cell.pctMito_min = NULL,
  cell.pctMito_max = NULL,
  cell.pctSpikeIn_min = NULL,
  cell.pctSpikeIn_max = NULL
)

Arguments

input_eset

The sparse eset object to be filtered

filter_mode

Character, mode to apply the filtration cutoffs: "auto" (the default) or "manual".

filter_type

Character, objective type to be filtered: "both" (the default), "cell" or "feature" .

gene.nCell_min

Numeric, the minimum number of cells that the qualified genes are identified in. Default: 1.

gene.nCell_max

Numeric, the maximum number of cells that the qualified genes are identified in. Default: Inf.

cell.nUMI_min

Numeric, the minimum number of total UMI counts per cell that the qualified cells carry. Default: 1.

cell.nUMI_max

Numeric, the maximum number of total UMI counts per cell that the qualified cells carry. Default: Inf.

cell.nFeature_min

Numeric, the minimum number of non-zero Features per cell that the qualified cells carry. Default: 1.

cell.nFeature_max

Numeric, the maximum number of non-zero Features per cell that the qualified cells carry. Default: Inf.

cell.pctMito_min

Numeric, the minimum percentage of UMI counts of mitochondrial genes that the qualified cells carry. Default: 0.

cell.pctMito_max

Numeric, the maximum percentage of UMI counts of mitochondrial genes that the qualified cells carry. Default: 1.

cell.pctSpikeIn_min

Numeric, the minimum percentage of UMI counts of spike-in that the qualified cells carry. Default: 0.

cell.pctSpikeIn_max

Numeric, the maximum percentage of UMI counts of spike-in that the qualified cells carry. Default: 1.

Value

A filtered sparse eset object. It also prints the summary of filtration to the screen.

Examples

data("pbmc14k_rawCount")
pbmc14k_raw.eset <- createSparseEset(input_matrix = pbmc14k_rawCount, projectID = "PBMC14k", addMetaData = TRUE)
#> Creating sparse eset from the input_matrix ...
#> 	Adding meta data based on input_matrix ...
#> Done! The sparse eset has been generated: 17986 genes, 14000 cells.

## 1. using the cutoffs automatically calculated by scMINER
pbmc14k_filtered_auto.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both")
#> Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
#> Checking passed! All 5 metrics are available.
#> Filtration is done!
#> Filtration Summary:
#> 	8846/17986 genes passed!
#> 	13605/14000 cells passed!
#> 
#> For more details:
#> 	Gene filtration statistics:
#> 		Metrics		nCell
#> 		Cutoff_Low	70
#> 		Cutoff_High	Inf
#> 		Gene_total	17986
#> 		Gene_passed	8846(49.18%)
#> 		Gene_failed	9140(50.82%)
#> 
#> 	Cell filtration statistics:
#> 		Metrics		nUMI		nFeature	pctMito		pctSpikeIn	Combined
#> 		Cutoff_Low	458		221		0		0		NA
#> 		Cutoff_High	3694		Inf		0.0408		0.0000		NA
#> 		Cell_total	14000		14000		14000		14000		14000
#> 		Cell_passed	13826(98.76%)	14000(100.00%)	13778(98.41%)	14000(100.00%)	13605(97.18%)
#> 		Cell_failed	174(1.24%)	0(0.00%)	222(1.59%)	0(0.00%)	395(2.82%)

## 2. using the cutoffs manually specified
pbmc14k_filtered_manual.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "manual", filter_type = "both",
                                                 gene.nCell_min = 10,
                                                 cell.nUMI_min = 500,
                                                 cell.nUMI_max = 6500,
                                                 cell.nFeature_min = 200,
                                                 cell.nFeature_max = 2500,
                                                 cell.pctMito_max = 0.1)
#> Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
#> Checking passed! All 5 metrics are available.
#> Filtration is done!
#> Filtration Summary:
#> 	12945/17986 genes passed!
#> 	13974/14000 cells passed!
#> 
#> For more details:
#> 	Gene filtration statistics:
#> 		Metrics		nCell
#> 		Cutoff_Low	10
#> 		Cutoff_High	Inf
#> 		Gene_total	17986
#> 		Gene_passed	12945(71.97%)
#> 		Gene_failed	5041(28.03%)
#> 
#> 	Cell filtration statistics:
#> 		Metrics		nUMI		nFeature	pctMito		pctSpikeIn	Combined
#> 		Cutoff_Low	500		200		0		0		NA
#> 		Cutoff_High	6500		2500		0.1000		1.0000		NA
#> 		Cell_total	14000		14000		14000		14000		14000
#> 		Cell_passed	13985(99.89%)	14000(100.00%)	13989(99.92%)	14000(100.00%)	13974(99.81%)
#> 		Cell_failed	15(0.11%)	0(0.00%)	11(0.08%)	0(0.00%)	26(0.19%)