Generate the standard input files for SJARACNe from sparse eset object
generateSJARACNeInput.RdThis function is used to generate the standard input files for SJARACNe, a scalable software tool for gene network reverse engineering from big data.
Usage
generateSJARACNeInput(
input_eset,
group_name = "clusterID",
group_name.refine = FALSE,
sjaracne_dir,
species_type = "hg",
driver_type = "TF_SIG",
customDriver_TF = NULL,
customDriver_SIG = NULL,
downSample_N = 1000,
seed = 123,
superCell_N = NULL,
superCell_count = 100,
superCell_gamma = 10,
superCell_knn = 5,
superCell_nHVG = 1000,
superCell_nPC = 10,
superCell_save = TRUE,
print_command = FALSE,
save_command = TRUE
)Arguments
- input_eset
The expression set object that filtered, normalized and log-transformed
- group_name
Character, name of the column for grouping, usually the column of cell types or clusters. Default: "
clusterID".- group_name.refine
Logical, whether to replace the non-word characters in group names with underscore symbol ("_"). The improper filename characters may cause troubles, since scMINER creates a folder for each group using the group names. Set this argument to
TRUEcan help avoid this issue. Default:FALSE.- sjaracne_dir
The path to the folder for SJARACNe runs. Both the inputs and outputs will be saved here.
- species_type
Character, species of the pre-defined driver list to use: "
hg" for human or "mm" for mouse. Default:hg.- driver_type
Character, type of the pre-defined driver list to use: "
TF" for transcriptional factors only, "SIG" for signaling genes only, or "TF_SIG" for both. Default: "TF_SIG".- customDriver_TF
A character vector or
NULL, genes used to replace the pre-defined transcriptional factor driver list. This allows the user to customize the TF driver list. Default:NULL.- customDriver_SIG
A character vector or
NULL, genes used to replace the pre-defined signaling gene driver list. This allows the user to customize the SIG driver list. Default:NULL.- downSample_N
Integer or
NULL, if an integer is given, the groups with more cells than this integer will be down-sampled to this integer. A number between 500 to 3000 gives a good balance between robustness and computational efficiency. IfNULL, the downsampling would be skipped. Default: 1000.- seed
Non-negative integer, seed of random sampling. Default: 123.
- superCell_N
Integer or
NULL, if an integer is given, the metacell method would be performed by SuperCell package to the groups with more cells than this integer. If NULL, no metacell method would be done. Default:NULL.- superCell_count
Integer, number of metacells to generate by SuperCell. Default: 100. Ignored if
superCell_N=NULL.- superCell_gamma
Integer, graining level of data by SuperCell (proportion of number of single cells in the initial dataset to the number of metacells in the final dataset). Default: 10. Ignored if
superCell_N=NULL.- superCell_knn
Integer, the k value to compute single-cell kNN network by SuperCell. Default: 5. Ignored if
superCell_N=NULL.- superCell_nHVG
Integer, number of genes with the largest variation to use by SuperCell. Default: 1000. Ignored if
superCell_N=NULL.- superCell_nPC
Integer, number of principal components to use for construction of single-cell kNN network by SuperCell.Default: 10. Ignored if
superCell_N=NULL.- superCell_save
Logical, whether to save the results generated by SuperCell, including membership and other components. Default: TRUE. Ignored if
superCell_N=NULL.- print_command
Logical, whether to print the command to run SJARACNe to screen. Default:
FALSE.- save_command
Logical, whether to save the command to run SJARACNe. Default:
TRUE.
Value
This function will generate several folders and files in the directory specified by "sjaracne_dir":
a folder for each group in the column specified by "
group_name";In each folder:
a "
.exp.txt" file: expression matrix, features by cells.a "
TF" folder containing a ".tf.txt" file: this file contains the TF driver list.a "
SIG" folder containing a ".sig.txt" file: this file contains the SIG driver list.a bash script (
runSJARACNe.sh) to run SJARACNe. Further modification is needed to run it.a json file (
config_cwlexec.json) containing parameters to run SJARACNe.
Examples
if (FALSE) { # \dontrun{
data(pbmc14k_expression.eset)
## 1. The most commonly used command: pre-defined driver lists, automatic down-sampling, no metacell method
generateSJARACNeInput(input_eset = pbmc14k_expression.eset,
group_name = "cell_type",
sjaracne_dir = "./SJARACNe",
species_type = "hg",
driver_type = "TF_SIG")
## 2. to disable the downsampling
generateSJARACNeInput(input_eset = pbmc14k_expression.eset,
group_name = "cell_type",
sjaracne_dir = "./SJARACNe",
species_type = "hg",
driver_type = "TF_SIG",
downSample_N = NULL)
## 3. Use the customized driver list: (add TUBB4A is the gene of interest but currently not in the pre-defined driver list)
# when the driver-to-add is known as a transcription factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_TF = c(getDriverList(species_type = "hg", driver_type = "TF"), "TUBB4A"))
# when the driver-to-add is known as a non-transcription factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_SIG = c(getDriverList(species_type = "hg", driver_type = "SIG"), "TUBB4A"))
# when it's ambiguous to tell if the driver-to-add is a transcriptional factor
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
customDriver_TF = c(getDriverList(species_type = "hg", driver_type = "TF"), "TUBB4A"),
customDriver_SIG = c(getDriverList(species_type = "hg", driver_type = "SIG"), "TUBB4A"))
## 4. Use the metacell method
generateSJARACNeInput(input_eset = pbmc14k_expression.eset, group_name = "trueLabel", sjaracne_dir = "./SJARACNe", species_type = "hg", driver_type = "TF_SIG",
superCell_N = 1000, superCell_count = 100, seed = 123)
} # }