Skip to contents

This function is used to read the gene expression data from a directory containing three files generated by 10x Genomics: matrix.mtx, barcodes.tsv and features.tsv (or genes.tsv). This function can handle these conditions well:

  • Alternative file names for feature data: features.tsv by CellRanger > 3.0, and genes.tsv by CellRanger < 3.0;

  • One or more input files are compressed, usually in ".gz" format;

  • Data with multiple modalities: like the single cell multiome data. In this case, it only retains the data of "Gene Expression".

Usage

readInput_10x.dir(
  input_dir,
  featureType = "gene_symbol",
  removeSuffix = TRUE,
  addPrefix = NULL
)

Arguments

input_dir

Path to the directory containing the 3 files generated by 10x Genomics: matrix.mtx, barcodes.tsv and features.tsv (or genes.tsv)

featureType

Character, feature type to use as the gene name of expression matrix: "gene_symbol" (the default) or "gene_id".

removeSuffix

Logical, whether to remove the suffix "-1" when present in all cell barcodes. Default: TRUE.

addPrefix

Character or NULL, add a prefix to the cell barcodes, like Sample ID. It is highly recommended to use a prefix containing letters and/or numbers only, and not starting with numbers. Default: NULL.

Value

A sparse gene expression matrix of raw UMI counts, genes by cells

Examples

input_dir <- system.file("extdata/demo_inputs/cell_matrix_10x", package = "scMINER") # path to input data
list.files(input_dir, full.names = FALSE) # you should see three files: matrix.mtx, barcodes.tsv and features.tsv (or genes.tsv)
#> [1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"  
sparseMatrix <- readInput_10x.dir(input_dir,
                                  featureType = "gene_symbol",
                                  removeSuffix = TRUE,
                                  addPrefix = "demoSample")
#> Reading 10x Genomcis data from: /private/var/folders/v0/njhqcmrs32xgrjgx2wz8d50r0000gp/T/Rtmpf8JULY/temp_libpath11ae7194479/scMINER/extdata/demo_inputs/cell_matrix_10x ...
#> 	Multiple data modalities were found: Gene Expression, Peaks . Only the gene expression data (under "Gene Expression") was kept.
#> Done! The sparse gene expression matrix has been generated: 500 genes, 100 cells.