Part III: Test Data Collection (optional)
We orginally curated this test data, contaiining 6 samples of different conditions (see table below), for the purpose of testing and troubleshooting this pipeline. We also use them for showcasing the input formats accepted by this pipeline, and benchmarking pipeline performance (e.g., Run Time, CPU Time and Max Memory Usage), though these six samples were prepared by downsampling from real data.
| Format | Library Type | Phred Encoding | Species | Number of Reads | Preparation | |
|---|---|---|---|---|---|---|
| sample1 | FASTQ | Paired-end | Phred+33 | human | 19,410,373 * 2 | Downsampled from real data |
| sample2 | FASTQ | Single-end | Phred+33 | mouse | 16,005,450 | Downsampled from real data |
| sample3 | FASTQ, multiple lanes | Paired-end | Phred+33 | human | 19,410,373 * 2 | Split by lane from sample1 |
| sample4 | FASTQ, multiple lanes | Single-end | Phred+33 | mouse | 16,005,450 | Split by lane from sample2 |
| sample5 | BAM/SAM (to genome) | Paired-end | Phred+33 | human | 13,856,075 * 2 | Downsampled from real data |
| sample6 | BAM/SAM (to transcriptome) | Single-end | Phred+33 | mouse | 13,533,162 | Downsampled from real data |
This is only one situation that you need collect the test data: you would like to walk through this pipeline but do not have your own bulk RAN-seq available. In this case, you can download the test data for one or more samples to get started.
- For those who can access to the pre-built conda environment, the test data is available at:
/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata. - For other users, please download them from Zenodo.