Chapter 1 Introduction
This chapter will introduce some principal concepts and unique features of scMINER.
1.1 A few concepts
There are a few concepts that may help you understand scMINER better.
SparseEset
The SparseExpressionSet
(or SparseEset
for short) is a new class created by scMINER to handle the sparsity in scRNA-seq data. It is derived from ExpressionSet, and enables to compress, store and access efficiently and conveniently.
The SparseEset object is the center of scRNA-seq data analysis by scMINER.
Mutual Information
Mutual information is a measure of the mutual dependence between two random variables. It quantifies the amount of information obtained about one variable through the other variable. In other words, it measures how much knowing the value of one variable reduces uncertainty about the value of the other variable. It’s widely used in probability theory and information theory.
Compared with the linear correlation that used by most existing tools for scRNA-seq data clustering, mutual information provides a more general measure of dependence that can capture both linear and non-linear relationships, and hence may increases the accuracy and sensitivity of scRNA-seq data clustering.
Linear Correlation | Mutual Information | |
---|---|---|
Definition | Measures linear relationship | Measures mutual dependence (both linear and non-linear) |
Range | -1 to 1 | 0 to Inf |
Sensitivity to outliers | Sensitive | Less sensitive |
Captures Non-linear Relationships | No | Yes |
Common Applications | Regression, finance, science | Feature selection, clustering, network inference |
Gene Activity
The gene activity estimation is one of the most important features of scMINER. Mathematically, the activity of one gene is a type of mean of the expressions of its targets. And biologically, the activity can be interpreted as a measure that describes how actively the driver functions, like the enzymes in digesting their subtracts, kinase in activating their downstream genes. Given the gene expression profiles and networks, scMINER can estimate the activities of some predefined drivers, including not only transcription factors (TFs) but also signaling genes (SIGs).
1.2 Why use scMINER
(more details to be added)
scMINER includes the following key functions:
Mutual information-based clustering: scMINER measures the cell-cell similarities with full feature-derived mutual information. It can catch both linear and non-linear correlations and performs better in cell clustering, especially for those of close states.
Gene activity estimation: scMINER rewires the cell-type specific gene networks solely from the scRNA-seq data, and then estimates the gene activities of not only transcription factors (TFs) but also signaling genes (SIGs). The gene activity-based analysis can expose the main regulators of various biological activities, like cellular linage differentiation and tissue specificity.
SparseEset-centered full-feature tool: scMINER provides a wide range of functions for data intake, quality control and filtration, MI-based clustering, network inference, gene activity estimation, cell type annotation, differential expression/activity analysis, and data visualization and sharing. Most of these functions are developed in an object-oriented manner for the SparseEset object.
1.3 Citation
Please consider citing this paper if you find scMINER useful in your research.
1.4 Support
We welcome your feedback! The scMINER software is developed and maintained by the Yu Lab @ St. Jude Children’s Research Hospital and is released under the Apache License (Version 2.0). Feel free to open an issue, or send us an email if you encounter a bug, need our help or just want to make a comment/suggestion.