STC-Seq HLA-typing and analysis pipeline

Manual

Manual

1. Requirement

bowtie
bowtie2
BBMap
R version 3.4.0 or larger version

Please install R,BBMap,bowtie,and bowtie2 on your computer and set path to your environment variables.
For example, if you are using bash, add to your .bashrc the following command.

#example
export PATH=$PATH:/path_to_bowtie:/path_to_BBMap:/path_to_R


2. Preperation

2.1 You should add the current directory to your PATH

export PATH=$PATH:/path_to_STC-Seq/bin

2.2 You should copy all files of /STC-Seq/data to current path

#example
cp -r /STC-Seq/data/* ./

2.3 The first running this program, you should generate all possible artificial reads (70 bp) using the large exons (>= 70 bp) of all of the alleles of the fourteen HLA genes.

Usage: [path_to getArtifical_reads] [path_to hla.exon.fasta] exon-70bp.fastq

2.3 Check program and data permissions


3. Running

3.1 Removing not properly covered alleles

Usage: [path_to step_1.sh] [prefix of x.fq]

#example
[path_to step_1.sh] x

3.2 Initial screening

3.2.1

Usage: [path_to step_2.sh] [prefix of x.fq]

#example
[path_to step_2.sh] x

3.2.2

Usage: [path_to Rscript] [path_to Initial_screening.R] [input_directory/input_name] [window_size default:7] [gap_size default:15] [output_name]

#example
[absolute_path_to Rscript] [absolute_path_to Initial_screening.R] x_position-array.txt 7 15 x_second_retain_allele.txt

3.3 Reciprocative screening

Usage: [path_to step_3.sh] [prefix of x.fq]

#example
[path_to step_3.sh] x


4. Output

HLA allele pairs called by STC-Seq will be recorded to x_report_null.txt

#example

Locus    Genotype    Alternative_genotype    Quality

Gene1    Allele_pair    Alternative_pair1    PASS or NOT PASS
Gene2    Allele_pair    Alternative_pair1;Alternative_pair2......    PASS or NOT PASS
Gene2    Allele_pair    null    PASS or NOT PASS

If one gene don't have allele pair is called, there don't have information about this gene in x_report_null.txt.


5. Cautions

If the program can't be run correctly, please check required software settings are correct.
Maybe you can modify the program path to an absolute path in /STC-Seq/script/*.sh.
Before running, there should have x.fq, cigar.txt, connect_exon_70bp.fasta, hla.exon.fasta and nullAllele_list.txt in the current working path.