Chapter 9 Network inference

SJARACNe is a scalable software tool for gene network reverse engineering from big data. As an improved implementation of the ARACNe, SJARACNe achieves a dramatic improvement in computational performance in both time and memory usage and implements new features while preserving the network inference accuracy of the original algorithm.

Similar to MICA, SJARACNe is also a component of scMINER framework, and can work seamlessly with the SparseExpressionSet object. The input for MICA can be easily generated from the SparseExpressionSet object by generateSJARACNeInput(), and the output of SJARACNe, the networks, can be effortlessly assessed by drawNetworkQC() and directly taken for driver activity estimation by getActivity_individual() and getActivity_inBatch().

9.1 Generate SJARACNe input files

The network inference is usually conducted in a cluster- or cell type-specific basis. Given the column names for grouping, generateSJARACNeInput() will create a folder for each group named by the group label.

IMPORTANT NOTE: Any illegal characters in path in group labels may cause issues in subsequent analysis. To avoid it, scMINER only accept letters(A-Za-z), numbers(0-9), underscores(’_‘) and periods(’.’).

In this case, the true labels of cell type are available, so we use them to define the groups for network inference.

## Columns with any illegal characters can not be used for groupping
generateSJARACNeInput(input_eset = pbmc14k_log2cpm.eset, group_name = "trueLabel", sjaracne_dir = "/work-path/PBMC14k/SJARACNe", species_type = "hg", driver_type = "TF_SIG", downSample_N = NULL)

For big datasets, generateSJARACNeInput() provides an argument, downSample_N, to allow users to down sample size of each group. The default value of downSample_N is 1,000, any group with >= 1,000 cells will be down-sample to 1,000.

## one folder for each group
list.dirs(system.file("extdata/demo_pbmc14k/PBMC14k/SJARACNe", package = "scMINER"), full.names = FALSE, recursive = FALSE)
## character(0)
## file structure of each folder
list.files(system.file("extdata/demo_pbmc14k/PBMC14k/SJARACNe/B", package = "scMINER"), full.names = FALSE, recursive = TRUE, include.dirs = FALSE, pattern = "[^consensus_network_ncol_.txt]")
## character(0)

The standard input files of SJARACNe, for each group, include:

  • a “.exp.txt” file: a tab-separated genes/transcripts/proteins by cells/samples expression matrix with the first two columns being ID and symbol.
  • a “TF” folder containing a “.tf.txt” file: a list of significant gene/transcript/protein IDs of TF drivers.
  • a “SIG” folder containing a “.sig.txt” file: a list of significant gene/transcript/protein IDs of SIG drivers.
  • a bash script (runSJARACNe.sh) to run SJARACNe. Further modification is needed to run it.
  • a json file (config_cwlexec.json) containing parameters to run SJARACNe.

Usually, the ground truth of cell types is not available. Then the cluster labels, or cell type annotations of the clusters, can be used for grouping in network rewiring, since it’s expected that cells with same cluster label/annotated cell type are of similar gene expression profiles. To generate from annotated cell types, you can run:

generateSJARACNeInput(input_eset = pbmc14k_log2cpm.eset, group_name = "cell_type", sjaracne_dir = "/work-path/PBMC14k/SJARACNe/bycelltype", species_type = "hg", driver_type = "TF_SIG")

9.2 Run SJARACNe

By default, the generateSJARACNeInput() function also generates a runSJARACNe.sh file in the folder of each group. This file much be modified before you can run it:

  • removed unneeded lines: There are usually 4 lines in this file: the lines starting with “sjaracne lsf” are the command lines to run on IBM LSF cluster, while the lines starting with “sjaracne local” are the command lines runing on a single machine (Linux/OSX). Please select the lines based on your situation and remove the others.
  • -n: number of bootstrap networks to generate. Default: 100.
  • -pc: p value threshold to select edges in building consensus network. Default: e-2 for single-cell data, e-3 for meta-cell data, and e-5 for bulk sample data. Please use “sjaracne lsf -h” or “sjaracne local -h” to check more details of arguments available in SJARACNe.

There is another file, config_cwlexec.json, available in the folder. It contains the information (e.g. memory request for each step of SJARACNe run) used for LSF job submission. This file is only needed for LSF runs and the default values works well in most cases. If you are running SJARACNe on a big dataset, you may need to request more memory from it.

In this case, we use LSF to run the SJARACNe:

## let's use B cell as an example
# for TF
sjaracne lsf -e /work-path/PBMC14k/SJARACNe/B/B.8572_1902.exp.txt -g /work-path/PBMC14k/SJARACNe/B/TF/B.835_1902.tf.txt -o /work-path/PBMC14k/SJARACNe/B/TF/bt100_pc001 -n 100 -pc 0.01 -j /work-path/PBMC14k/SJARACNe/B/config_cwlexec.json

# for SIG
sjaracne lsf -e /work-path/PBMC14k/SJARACNe/B/B.8572_1902.exp.txt -g /work-path/PBMC14k/SJARACNe/B/SIG/B.4148_1902.sig.txt -o /work-path/PBMC14k/SJARACNe/B/SIG/bt100_pc001 -n 100 -pc 0.01 -j /work-path/PBMC14k/SJARACNe/B/config_cwlexec.json

We created a folder named “bt100_pc001” in both TF and SIG folders of each group, to save the networks generated under 100 bootstraps (-n 100) and 0.01 consensus p value (-pc 0.01).

To run SJARACNe on a local machine:

## let's use B cell as an example
# for TF
sjaracne local -e /work-path/PBMC14k/SJARACNe/B/B.8572_1902.exp.txt -g /work-path/PBMC14k/SJARACNe/B/TF/B.835_1902.tf.txt -o /work-path/PBMC14k/SJARACNe/B/TF/bt100_pc001 -n 100 -pc 0.01

# for SIG
sjaracne local -e /work-path/PBMC14k/SJARACNe/B/B.8572_1902.exp.txt -g /work-path/PBMC14k/SJARACNe/B/SIG/B.4148_1902.sig.txt -o /work-path/PBMC14k/SJARACNe/B/SIG/bt100_pc001 -n 100 -pc 0.01

9.3 Assess the quality of networks

9.3.1 Introduction to the network file by SJARACNe

The core output of SJARACNe is the network file named consensus_network_ncol_.txt. It contains 9 columns:

  • source: ID of the source gene, can be the gene symbol;
  • target: ID of the target gene, can be the gene symbol;
  • source.symbol: symbol of the source gene;
  • target.symbol: symbol of the target gene;
  • MI: mutual information of source-gene pair;
  • pearson: Pearson correlation coefficient, [-1,1]
  • pearson: Spearman correlation coefficient, [-1,1]
  • slope: slop of the regression line, returned by stats.linregression()
  • p.value: p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic
network_format <- read.table(system.file("extdata/demo_pbmc14k/SJARACNe/B/TF/bt100_pc001/consensus_network_ncol_.txt", package = "scMINER"),
                             header = T, sep = "\t", quote = "", stringsAsFactors = F)
head(network_format)
##   source     target source.symbol target.symbol     MI pearson spearman   slope
## 1   AATF      ACBD3          AATF         ACBD3 0.0509 -0.0310  -0.0311 -0.0193
## 2   AATF       ADD3          AATF          ADD3 0.0486  0.0228   0.0258  0.0307
## 3   AATF        AES          AATF           AES 0.0511  0.0311   0.0289  0.0668
## 4   AATF     AKR7A2          AATF        AKR7A2 0.0498  0.0319   0.0366  0.0421
## 5   AATF AL928768.3          AATF    AL928768.3 0.0447  0.0247   0.0293  0.0335
## 6   AATF       ALG8          AATF          ALG8 0.0479  0.0358   0.0373  0.0234
##   p.value
## 1  0.1761
## 2  0.3204
## 3  0.1756
## 4  0.1646
## 5  0.2815
## 6  0.1183

9.3.2 Generate network QC report

There is no simple standards to tell the reliability of networks. Empirically, a network with 50-300 target size is good. scMINER provides a function, drawNetworkQC(), to quickly assess the quality of networks in batch.

network_stats <- drawNetworkQC(sjaracne_dir = "/work-path/PBMC14K/SJARACNe", generate_html = FALSE) # Set `generate_html = TRUE` to generate html-format QC report for each network file
## The network QC statistics table is saved seperately, for demonstration purposes.
network_stats <- readRDS(system.file("extdata/demo_pbmc14k/SJARACNe/network_stats.rds", package = "scMINER"))
head(network_stats)
##              network_tag network_node network_edge driver_count targetSize_mean
## 1      B.SIG.bt100_pc001         8572       391889         4148        94.47662
## 2       B.TF.bt100_pc001         8572        95341          835       114.18084
## 3 CD4TCM.SIG.bt100_pc001         8660       382153         4209        90.79425
## 4  CD4TCM.TF.bt100_pc001         8660        94319          838       112.55251
## 5  CD4TN.SIG.bt100_pc001         8612       401658         4180        96.09043
## 6   CD4TN.TF.bt100_pc001         8612        95152          831       114.50301
##   targetSize_median targetSize_minimum targetSize_maximum
## 1              94.0                 33                396
## 2              96.0                 64                913
## 3              91.0                 31                281
## 4              95.5                 60                689
## 5              95.0                 43                303
## 6              99.0                 64                743
##                                                                                                                                                                                               network_path
## 1      /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/B/SIG/bt100_pc001/sjaracne_workflow-df798096-8dee-4baf-8f70-891c689dc769/consensus_network_ncol_.txt
## 2       /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/B/TF/bt100_pc001/sjaracne_workflow-fb2a69b9-f98e-47ff-87a0-6d538822fc6e/consensus_network_ncol_.txt
## 3 /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/CD4TCM/SIG/bt100_pc001/sjaracne_workflow-424f1068-13d1-4f0e-9c26-56acd9a2027c/consensus_network_ncol_.txt
## 4  /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/CD4TCM/TF/bt100_pc001/sjaracne_workflow-52b3cdf5-5914-4c8c-a77a-05f17c755d83/consensus_network_ncol_.txt
## 5  /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/CD4TN/SIG/bt100_pc001/sjaracne_workflow-7b5bb68e-1de5-4d0e-80ec-8d8aa037866f/consensus_network_ncol_.txt
## 6   /Volumes/projects/scRNASeq/yu3grp/scMINER/NG_Revision/QPan/scminer_R/Datasets/PBMC14K/SJARACNe/CD4TN/TF/bt100_pc001/sjaracne_workflow-89716541-eb53-435c-8a45-bab63d6b5198/consensus_network_ncol_.txt