Chapter 10 Actvity-based analysis
The driver activity estimation is one of the most important features of scMINER. Mathematically, the activity of one driver is a type of mean of the expressions of its targets. And biologically, the activity can be interpreted as a measure that describes how actively the driver functions, like the enzymes in digesting their subtracts, kinase in activating their downstream genes. Given the gene expression profiles and networks, scMINER can estimate the activities of some predefined drivers, including not only transcription factors (TFs) but also signaling genes (SIGs). scMINER provides a few functions to effortlessly calculate the activities, identify the hidden drivers and visualize them in multiple ways.
10.1 Calculate the activities
scMINER provides two functions, getActivity_individual()
and getActivity_inBatch()
, to calculate the driver activities.
10.1.1 Calculate activities per group
getActivity_individual()
is designed to calculate the activities per group. It takes the network files as the input:
## let's use B cell as an example
activity_B.eset <- getActivity_individual(input_eset = pbmc14k_log2cpm.eset[, pData(pbmc14k_log2cpm.eset)$trueLabel == "B"],
network_file.tf = system.file("extdata/demo_pbmc14k/SJARACNe/B/TF/bt100_pc001/consensus_network_ncol_.txt", package = "scMINER"),
network_file.sig = system.file("extdata/demo_pbmc14k/SJARACNe/B/SIG/bt100_pc001/consensus_network_ncol_.txt", package = "scMINER"),
driver_type = "TF_SIG")
10.1.2 Calculate activities in batch
If you need to calculate the activity for multiple groups, this is usually the case, you can do it using getActivity_individual()
as shown above one by one and merge the esets after that. Or, scMINER privides another function, getActivity_inBatch()
, to calculate the activity in batch:
## let's use B cell as an example
activity.eset <- getActivity_inBatch(input_eset = pbmc14k_log2cpm.eset, sjaracne_dir = system.file("extdata/demo_pbmc14k/SJARACNe", package = "scMINER"), group_name = "trueLabel", driver_type = "TF_SIG", activity_method = "mean", do.z_normalization = TRUE)
## 7 groups were found in trueLabel ...
## Checking network files for each group ...
## Group 1 / 7 : Monocyte ...
## TF network check passed!
## SIG network check passed!
## Group 2 / 7 : B ...
## TF network check passed!
## SIG network check passed!
## Group 3 / 7 : CD4Treg ...
## TF network check passed!
## SIG network check passed!
## Group 4 / 7 : CD4TN ...
## TF network check passed!
## SIG network check passed!
## Group 5 / 7 : CD4TCM ...
## TF network check passed!
## SIG network check passed!
## Group 6 / 7 : NK ...
## TF network check passed!
## SIG network check passed!
## Group 7 / 7 : CD8TN ...
## TF network check passed!
## SIG network check passed!
## Calculating activity for each group ...
## Group 1 / 7 : Monocyte ...
## Activity calculation is completed successfully!
## Group 2 / 7 : B ...
## Activity calculation is completed successfully!
## Group 3 / 7 : CD4Treg ...
## Activity calculation is completed successfully!
## Group 4 / 7 : CD4TN ...
## Activity calculation is completed successfully!
## Group 5 / 7 : CD4TCM ...
## Activity calculation is completed successfully!
## Group 6 / 7 : NK ...
## Activity calculation is completed successfully!
## Group 7 / 7 : CD8TN ...
## Activity calculation is completed successfully!
## NAs were found in the activity matrix and have been replaced by the minimum value: -0.3968794 .
10.2 Differential activity analysis
Similar to getDE()
, scMINER provides a function, getDA()
, to perform the differential activity analysis and identify the group-specific drivers.
## 1. To perform differential expression analysis in a 1-vs-rest manner for all groups
da_res1 <- getDA(input_eset = activity.eset, group_by = "cell_type", use_method = "t.test")
## 7 groups were found in group_by column [ cell_type ].
## Since no group was specified, the differential analysis will be conducted among all groups in the group_by column [ cell_type ] in the 1-vs-rest manner.
## 1 / 7 : group 1 ( B ) vs the rest...
## 1912 cells were found for g1.
## 11693 cells were found for g0.
## 2 / 7 : group 1 ( CD4TCM ) vs the rest...
## 2022 cells were found for g1.
## 11583 cells were found for g0.
## 3 / 7 : group 1 ( CD4TN ) vs the rest...
## 2505 cells were found for g1.
## 11100 cells were found for g0.
## 4 / 7 : group 1 ( CD4Treg ) vs the rest...
## 1448 cells were found for g1.
## 12157 cells were found for g0.
## 5 / 7 : group 1 ( CD8TN ) vs the rest...
## 2014 cells were found for g1.
## 11591 cells were found for g0.
## 6 / 7 : group 1 ( Monocyte ) vs the rest...
## 1786 cells were found for g1.
## 11819 cells were found for g0.
## 7 / 7 : group 1 ( NK ) vs the rest...
## 1918 cells were found for g1.
## 11687 cells were found for g0.
## feature g1_tag g0_tag g1_avg
## 4 AASDH_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.008071658
## 6 AATF_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.051767485
## 12 ABCB8_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.077615607
## 8 ABCA2_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.081643603
## 10 ABCB1_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.134357577
## 3 AARSD1_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.126010447
## g0_avg g1_pct g0_pct log2FC Pval FDR
## 4 -0.1025141 0.43043933 0.13991277 0.094442475 2.225074e-308 0.000000e+00
## 6 -0.1084165 0.21652720 0.08757376 0.056649005 3.918924e-189 5.878386e-189
## 12 -0.1094585 0.35251046 0.14153767 0.031842866 3.623209e-12 3.952592e-12
## 8 -0.1101867 0.10198745 0.15676045 0.028543082 1.914570e-58 2.418404e-58
## 10 -0.1559384 0.04393305 0.06114770 0.021580866 8.079661e-27 9.233898e-27
## 3 -0.1225746 0.04184100 0.08192936 -0.003435892 4.213744e-02 4.213744e-02
## Zscore
## 4 37.53784
## 6 29.33316
## 12 6.95115
## 8 16.11775
## 10 10.72137
## 3 -2.03216
## 2. To perform differential expression analysis in a 1-vs-rest manner for one specific group
da_res2 <- getDA(input_eset = activity.eset, group_by = "cell_type", g1 = c("B"), use_method = "t.test")
## 3. To perform differential expression analysis in a rest-vs-1 manner for one specific group
da_res3 <- getDA(input_eset = activity.eset, group_by = "cell_type", g0 = c("B"), use_method = "t.test")
## 4. To perform differential expression analysis in a 1-vs-1 manner for any two groups
da_res4 <- getDA(input_eset = activity.eset, group_by = "cell_type", g1 = c("CD4Treg"), g0 = c("CD4TCM"), use_method = "t.test")
The getTopFeatures()
function can aslo be used to easily extract the group-specific markers from the differential expression result:
top_drivers <- getTopFeatures(input_table = da_res1, number = 10, group_by = "g1_tag", sort_by = "log2FC", sort_decreasing = TRUE)
dim(top_drivers)
## [1] 16 11
## feature g1_tag g0_tag g1_avg
## 4 AASDH_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.008071658
## 6 AATF_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.051767485
## 12 ABCB8_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.077615607
## 8 ABCA2_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.081643603
## 10 ABCB1_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.134357577
## 3 AARSD1_SIG B CD4TCM,CD4TN,CD4Treg,CD8TN,Monocyte,NK -0.126010447
## g0_avg g1_pct g0_pct log2FC Pval FDR
## 4 -0.1025141 0.43043933 0.13991277 0.094442475 2.225074e-308 0.000000e+00
## 6 -0.1084165 0.21652720 0.08757376 0.056649005 3.918924e-189 5.878386e-189
## 12 -0.1094585 0.35251046 0.14153767 0.031842866 3.623209e-12 3.952592e-12
## 8 -0.1101867 0.10198745 0.15676045 0.028543082 1.914570e-58 2.418404e-58
## 10 -0.1559384 0.04393305 0.06114770 0.021580866 8.079661e-27 9.233898e-27
## 3 -0.1225746 0.04184100 0.08192936 -0.003435892 4.213744e-02 4.213744e-02
## Zscore
## 4 37.53784
## 6 29.33316
## 12 6.95115
## 8 16.11775
## 10 10.72137
## 3 -2.03216