BS Document - BIG Data Center - National Genomics Data Center (CNCB

Home Tools

BS-RNA

Installation

BS-RNA is written in Perl and is executed from the command line in LINUX system. To install BS-RNA simply copy the BS-RNA_v1.0.tar.gz file (please download from http://bs-rna.big.ac.cn) into a BS-RNA installation folder and extract all the files by typing:

tar xzf BS-RNA_v1.0.tar.gz

BS-RNA requires a working of Perl, Python (at least Python2.7.8), HISAT2 (at least hisat2-2.0.1-beta) and SAMtools (at least SAMtools-1.0). Therefore it is a requirement that they are installed on your machine. BS-RNA will assume that these software are all in the working path unless their paths are specified manually. Furthermore bowtie2 should also be in the working path as HISAT2 uses the bowtie2 implementation to handle most of the operations on the FM index.

Manual

Either paired-end or single-end reads with variable read length from strand-specific libraries are supported by BS-RNA. The input sequence format should be uncompressed FastQ.

First you need download the reference genome sequences files of your concerned species and place them in a folder. Only single-entry files are supported. BS-RNA supports reference genome sequences in FastA format. The name begin with "chr" and the only allowed file extension is .fa. Secondly a gene model annotation file also need to be downloaded, which should be in GTF format.

Furthermore, two configure files could be specified for indexing the reference genome sequences and mapping the RNA sequencing data to the reference genome sequences if the user want to custom the corresponding parameters. An instruction on how to generate the configure file for hisat2-build indexer or hisat2 could be found in the downloaded package. Each option should be specified in one single line.

Usage: BS-RNA_v1.0 [Options]
Options:
--perlDir	Path	Full path of the perl scripts
--reads1	File	Input T-rich reads file
--reads2	File	Input A-rich reads file
--gene	File	Supply BS-RNA with a set of gene model annotations, a GTF format file
--rawRef	Path	Directory of raw reference genome sequences
--convertRef	Path	Directory of converted reference genome sequences
--pathToPython	Path	Full path </.../.../> to the Python installation on your system
		If not specified it is assumed that Python is in the PATH
--pathToHISAT2	Path	Full path </.../.../> to the HISAT2 installation on your system
		If not specified it is assumed that HISAT2 is in the PATH
--pathToSAMtools	Path	Full path </.../.../> to the SAMtools installation on your system
		If not specified it is assumed that SAMtools is in the PATH
--phred64		Qualities are ASCII chars equal to the Phred quality plus 64
		"off" means Qualities are ASCII chars equal to the Phred quality plus 33
		"Default: off
--specBuild	File	Configure file for hisat2-build indexer
--specHisat2	File	Configure file for HISAT2
--outDir	Path	Result output directory
--h or help		Display this message

A typical command for analyzing paired-end RBS-seq data is as follows:

BS-RNA_v1.0 --perlDir script --reads1 test_T-rich.fq --reads2 test_A-rich.fq --gene Homo_sapiens.GRCh37.75.gtf
        --rawRef hg19_ref --phred64 --pathToPython /.../python2.7.8/bin
        --pathToHISAT2 /.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result

While for a single-end T-rich reads file is like this:

BS-RNA_v1.0 --perlDir script --reads1 test_T-rich.fq --gene Homo_sapiens.GRCh37.75.gtf --rawRef hg19_ref
        --phred64 --pathToPython /.../python2.7.8/bin --pathToHISAT2
        /.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result

Or for a single-end A-rich reads file:

BS-RNA_v1.0 --perlDir script --reads2 test_A-rich.fq --gene Homo_sapiens.GRCh37.75.gtf --rawRef hg19_ref
        --phred64 --pathToPython /.../python2.7.8/bin --pathToHISAT2
        /.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result

If the reference genome sequences have been converted in the previous analysis, please skip this step by adding this option to save time: "--convertRef path_of_converted_reference_genome". In this situation, BS-RNA generates three folders in the specified output directory:

Map: contains mapping result file in SAM format and another file with spliced sites.

Filter: contains filtered mapping result file in SAM format and a statistic file called "filter_mapping.sam.maprate" containing the following information:

total: total reads number

map: mapped reads number

uniq: uniq mapped reads number

cor: correctly mapping on a corresponding strand reads number

used%: percent of correctly mapping on a corresponding strand reads

ps. The reads are mapped to the converted reference genome sequences, therefore the chromosome present in the SAM file contain "C-T" (represent the chromosome which convert all cytosines to thymines) or "G-A" (represent the chromosome which convert all guanines to adenines).

Level: contains BED files, which presents the following information for each covered cytosine site:

chr: name of the chromosome

start: cytosine chromosomal coordinates (0-based)

end: cytosine chromosomal coordinates (1-based)

strand: ��+�� means forward strand and ��-�� means crick strand

mCtype: methylation site type, one of the following [CG, CHG, CHH]

depth: total number of reads mapped to the cytosine site

mCdep: total number of reads that supported a methylated cytosine at this position

level: methylation level at the cytosine position

If the "--convertRef" option is not specified, an extra folder named "ref_C-T_G-A" will also be created in the output directory. This folder contains the concatenated raw genomce sequences and converted genome sequences in FastA format as well as the corresponding bowtie2 indexed files.