Filter the cells and/or features of sparse eset object using automatic or self-customized cutoffs
filterSparseEset.Rd
This function is used to remove the cells and features of low quality. It provides two modes to define the cutoffs:
"auto": in this mode, scMINER will estimate the cutoffs based on Median ± 3*MAD (maximum absolute deviation). This mode works well for the matrix of raw UMI counts or TPM (Transcripts Per Million) values.
"manual": in this mode, the users can manually specify the cutoffs, both low and high, of all 5 metrics: nUMI, nFeature, pctMito, pctSpikeIn for cells, and nCell for genes. No cells or features would be removed under the default cutoffs of each metrics.
Usage
filterSparseEset(
input_eset,
filter_mode = "auto",
filter_type = "both",
gene.nCell_min = NULL,
gene.nCell_max = NULL,
cell.nUMI_min = NULL,
cell.nUMI_max = NULL,
cell.nFeature_min = NULL,
cell.nFeature_max = NULL,
cell.pctMito_min = NULL,
cell.pctMito_max = NULL,
cell.pctSpikeIn_min = NULL,
cell.pctSpikeIn_max = NULL
)
Arguments
- input_eset
The sparse eset object to be filtered
- filter_mode
Character, mode to apply the filtration cutoffs: "auto" (the default) or "manual".
- filter_type
Character, objective type to be filtered: "both" (the default), "cell" or "feature" .
- gene.nCell_min
Numeric, the minimum number of cells that the qualified genes are identified in. Default: 1.
- gene.nCell_max
Numeric, the maximum number of cells that the qualified genes are identified in. Default: Inf.
- cell.nUMI_min
Numeric, the minimum number of total UMI counts per cell that the qualified cells carry. Default: 1.
- cell.nUMI_max
Numeric, the maximum number of total UMI counts per cell that the qualified cells carry. Default: Inf.
- cell.nFeature_min
Numeric, the minimum number of non-zero Features per cell that the qualified cells carry. Default: 1.
- cell.nFeature_max
Numeric, the maximum number of non-zero Features per cell that the qualified cells carry. Default: Inf.
- cell.pctMito_min
Numeric, the minimum percentage of UMI counts of mitochondrial genes that the qualified cells carry. Default: 0.
- cell.pctMito_max
Numeric, the maximum percentage of UMI counts of mitochondrial genes that the qualified cells carry. Default: 1.
- cell.pctSpikeIn_min
Numeric, the minimum percentage of UMI counts of spike-in that the qualified cells carry. Default: 0.
- cell.pctSpikeIn_max
Numeric, the maximum percentage of UMI counts of spike-in that the qualified cells carry. Default: 1.
Examples
data("pbmc14k_rawCount")
pbmc14k_raw.eset <- createSparseEset(input_matrix = pbmc14k_rawCount, projectID = "PBMC14k", addMetaData = TRUE)
#> Creating sparse eset from the input_matrix ...
#> Adding meta data based on input_matrix ...
#> Done! The sparse eset has been generated: 17986 genes, 14000 cells.
## 1. using the cutoffs automatically calculated by scMINER
pbmc14k_filtered_auto.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both")
#> Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
#> Checking passed! All 5 metrics are available.
#> Filtration is done!
#> Filtration Summary:
#> 8846/17986 genes passed!
#> 13605/14000 cells passed!
#>
#> For more details:
#> Gene filtration statistics:
#> Metrics nCell
#> Cutoff_Low 70
#> Cutoff_High Inf
#> Gene_total 17986
#> Gene_passed 8846(49.18%)
#> Gene_failed 9140(50.82%)
#>
#> Cell filtration statistics:
#> Metrics nUMI nFeature pctMito pctSpikeIn Combined
#> Cutoff_Low 458 221 0 0 NA
#> Cutoff_High 3694 Inf 0.0408 0.0000 NA
#> Cell_total 14000 14000 14000 14000 14000
#> Cell_passed 13826(98.76%) 14000(100.00%) 13778(98.41%) 14000(100.00%) 13605(97.18%)
#> Cell_failed 174(1.24%) 0(0.00%) 222(1.59%) 0(0.00%) 395(2.82%)
## 2. using the cutoffs manually specified
pbmc14k_filtered_manual.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "manual", filter_type = "both",
gene.nCell_min = 10,
cell.nUMI_min = 500,
cell.nUMI_max = 6500,
cell.nFeature_min = 200,
cell.nFeature_max = 2500,
cell.pctMito_max = 0.1)
#> Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
#> Checking passed! All 5 metrics are available.
#> Filtration is done!
#> Filtration Summary:
#> 12945/17986 genes passed!
#> 13974/14000 cells passed!
#>
#> For more details:
#> Gene filtration statistics:
#> Metrics nCell
#> Cutoff_Low 10
#> Cutoff_High Inf
#> Gene_total 17986
#> Gene_passed 12945(71.97%)
#> Gene_failed 5041(28.03%)
#>
#> Cell filtration statistics:
#> Metrics nUMI nFeature pctMito pctSpikeIn Combined
#> Cutoff_Low 500 200 0 0 NA
#> Cutoff_High 6500 2500 0.1000 1.0000 NA
#> Cell_total 14000 14000 14000 14000 14000
#> Cell_passed 13985(99.89%) 14000(100.00%) 13989(99.92%) 14000(100.00%) 13974(99.81%)
#> Cell_failed 15(0.11%) 0(0.00%) 11(0.08%) 0(0.00%) 26(0.19%)