Chapter 6 Data normalization

In this chapter, we will introduce the method of data normalization in scMINER.

We recommend to use log2CPM method for normalization: the raw counts in each cell are normalized to a library size of 1 million, followed by log2 transformation.

pbmc14k_log2cpm.eset <- normalizeSparseEset(pbmc14k_filtered.eset, scale_factor = 1000000, log_base = 2, log_pseudoCount = 1)

## Done! The data matrix of eset has been normalized and log-transformed!
## The returned eset contains: 8846 genes, 13605 cells.

exprs(pbmc14k_log2cpm.eset)[1:5,1:5]

## 5 x 5 sparse Matrix of class "dgCMatrix"
##           CACTTTGACGCAAT GTTACGGAAACGAA CACTTATGAGTCGT GCATGTGATTCTGT
## LINC00115              .        .                    .              .
## NOC2L                  .        .                    .              .
## HES4                   .        .                    .              .
## ISG15                  .       10.05794              .              .
## C1orf159               .        .                    .              .
##           TAGAATACGTATCG
## LINC00115              .
## NOC2L                  .
## HES4                   .
## ISG15                  .
## C1orf159               .

This normalized and log-transformed SparseEset object can be directly used for Mutual Information-based clustering, network inference and other downstream analysis.

Don’t forget to save the SparseEset object after data normalization.

saveRDS(pbmc14k_log2cpm.eset, file = "/your-path/PBMC14k/DATA/pbmc14k_log2CPM_annotated.rds")