Generate the standard input files for SJARACNe from sparse eset object
generateSJARACNeInput.Rd
This function is used to generate the standard input files for SJARACNe, a scalable software tool for gene network reverse engineering from big data.
Usage
generateSJARACNeInput(
input_eset,
group_name = "clusterID",
group_name.refine = FALSE,
sjaracne_dir,
species_type = "hg",
driver_type = "TF_SIG",
customDriver_TF = NULL,
customDriver_SIG = NULL,
downSample_N = 1000,
seed = 123,
superCell_N = NULL,
superCell_count = 100,
superCell_gamma = 10,
superCell_knn = 5,
superCell_nHVG = 1000,
superCell_nPC = 10,
superCell_save = TRUE,
print_command = FALSE,
save_command = TRUE
)
Arguments
- input_eset
The expression set object that filtered, normalized and log-transformed
- group_name
Character, name of the column for grouping, usually the column of cell types or clusters. Default: "
clusterID
".- group_name.refine
Logical, whether to replace the non-word characters in group names with underscore symbol ("_"). The improper filename characters may cause troubles, since scMINER creates a folder for each group using the group names. Set this argument to
TRUE
can help avoid this issue. Default:FALSE
.- sjaracne_dir
The path to the folder for SJARACNe runs. Both the inputs and outputs will be saved here.
- species_type
Character, species of the pre-defined driver list to use: "
hg
" for human or "mm
" for mouse. Default:hg
.- driver_type
Character, type of the pre-defined driver list to use: "
TF
" for transcriptional factors only, "SIG
" for signaling genes only, or "TF_SIG
" for both. Default: "TF_SIG
".- customDriver_TF
A character vector or
NULL
, genes used to replace the pre-defined transcriptional factor driver list. This allows the user to customize the TF driver list. Default:NULL
.- customDriver_SIG
A character vector or
NULL
, genes used to replace the pre-defined signaling gene driver list. This allows the user to customize the SIG driver list. Default:NULL
.- downSample_N
Integer or
NULL
, if an integer is given, the groups with more cells than this integer will be down-sampled to this integer. A number between 500 to 3000 gives a good balance between robustness and computational efficiency. IfNULL
, the downsampling would be skipped. Default: 1000.- seed
Non-negative integer, seed of random sampling. Default: 123.
- superCell_N
Integer or
NULL
, if an integer is given, the metacell method would be performed by SuperCell package to the groups with more cells than this integer. If NULL, no metacell method would be done. Default:NULL
.- superCell_count
Integer, number of metacells to generate by SuperCell. Default: 100. Ignored if
superCell_N
=NULL
.- superCell_gamma
Integer, graining level of data by SuperCell (proportion of number of single cells in the initial dataset to the number of metacells in the final dataset). Default: 10. Ignored if
superCell_N
=NULL
.- superCell_knn
Integer, the k value to compute single-cell kNN network by SuperCell. Default: 5. Ignored if
superCell_N
=NULL
.- superCell_nHVG
Integer, number of genes with the largest variation to use by SuperCell. Default: 1000. Ignored if
superCell_N
=NULL
.- superCell_nPC
Integer, number of principal components to use for construction of single-cell kNN network by SuperCell.Default: 10. Ignored if
superCell_N
=NULL
.- superCell_save
Logical, whether to save the results generated by SuperCell, including membership and other components. Default: TRUE. Ignored if
superCell_N
=NULL
.- print_command
Logical, whether to print the command to run SJARACNe to screen. Default:
FALSE
.- save_command
Logical, whether to save the command to run SJARACNe. Default:
TRUE
.
Value
This function will generate several folders and files in the directory specified by "sjaracne_dir
":
a folder for each group in the column specified by "
group_name
";In each folder:
a "
.exp.txt
" file: expression matrix, features by cells.a "
TF
" folder containing a ".tf.txt
" file: this file contains the TF driver list.a "
SIG
" folder containing a ".sig.txt
" file: this file contains the SIG driver list.a bash script (
runSJARACNe.sh
) to run SJARACNe. Further modification is needed to run it.a json file (
config_cwlexec.json
) containing parameters to run SJARACNe.
Examples
if (FALSE) { # \dontrun{
data(pbmc14k_expression.eset)
## 1. The most commonly used command: pre-defined driver lists, automatic down-sampling, no metacell method
generateSJARACNeInput(input_eset = pbmc14k_expression.eset,
group_name = "cell_type",
sjaracne_dir = "./SJARACNe",
species_type = "hg",
driver_type = "TF_SIG")
## 2. to disable the downsampling
generateSJARACNeInput(input_eset = pbmc14k_expression.eset,
group_name = "cell_type",
sjaracne_dir = "./SJARACNe",
species_type = "hg",
driver_type = "TF_SIG",
downSample_N = NULL)
## 3. Use the customized driver list: (add TUBB4A is the gene of interest but currently not in the pre-defined driver list)
# when the driver-to-add is known as a transcription factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_TF = c(getDriverList(species_type = "hg", driver_type = "TF"), "TUBB4A"))
# when the driver-to-add is known as a non-transcription factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_SIG = c(getDriverList(species_type = "hg", driver_type = "SIG"), "TUBB4A"))
# when it's ambiguous to tell if the driver-to-add is a transcriptional factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_TF = c(getDriverList(species_type = "hg", driver_type = "TF"), "TUBB4A"),
customDriver_SIG = c(getDriverList(species_type = "hg", driver_type = "SIG"), "TUBB4A"))
## 4. Use the metacell method
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
superCell_N = 1000, superCell_count = 100, seed = 123)
} # }