CandiHap A haplotype analysis toolkit for natural variation study
Manual
CandiHap: a haplotype analysis toolkit for natural variation study.
CandiHap
is a user-friendly local software, that can fast preselect candidate causal SNPs from Sanger or next-generation sequencing data
, and report results in table and exquisite vector-graphs within a minute. Investigators can use CandiHap to specify a gene or linkage sites based on GWAS and explore favourable haplotypes of candidate genes for target traits. CandiHap can be run on computers with Windows
, Mac OS X
, or Linux
platforms in graphical user interface or command lines, and applied to any species
of plant, animal and microbial. CandiHap is publicly available at https://github.com/xukaili/CandiHap or https://bigd.big.ac.cn/biocode/tools/BT007080 as an open-source software. The analysis of CandiHap can do as the followings:
1). Convert the VCF file to the hapmap format for CandiHap (vcf2hmp
);
2). Haplotype analysis for a gene (CandiHap
);
3). Haplotype analysis for all genes in the LD regions of a significant SNP one by one (GWAS_LD2haplotypes
);
4). Haplotype analysis for Sanger sequencing data of population variation (sanger_CandiHap.sh
).
License
Academic users
may download and use the application free of charge according to the accompanying license.Commercial users
must obtain a commercial license from Xukai Li.
If you have used the program to obtain results, please cite the following paper:
Xukai Li☯* (李旭凯), Zhiyong Shi☯ (石志勇), Jianhua Gao (高建华), Xingchun Wang (王兴春), Kai Guo* (郭凯). CandiHap: a haplotype analysis toolkit for natural variation study. Molecular Breeding, 2023, 43:21. https://doi.org/10.1007/s11032-023-01366-4
(☯ Equal contributors; * Correspondence)
For Linux system (command lines)
First of all, please install the R software environment, and three packages.
To Install R
for Linux and packages
1. Open an internet browser and go to link: https://www.r-project.org
2. Click the 'download R
' link in the middle of the page under 'Getting Started
'.
3. Select a CRAN location (a mirror site
) and click the corresponding link.
4. Click on the 'Download R for Linux
' link at the top of the page.
5. Click on Download 'R-3.5.0
' (or a newer version).
6. Install R and leave all default settings in the installation options.
7. Open R and install three packages by command:
install.packages(c("ggplot2","agricolae", "ggbeeswarm"))
Getting started
There are mainly three steps included in the CandiHap analytical through command lines, and the test data files can freely download at test_data.zip
.
Put vcf2hmp.pl
test.gff, test.vcf, and genome.fa files in a same dir, then run:
# 1. To annotate the vcf by ANNOVAR (Version: 2019-10-24 00:05:27 -0400):
gffread test.gff -T -o test.gtf
gtfToGenePred -genePredExt test.gtf si_refGene.txt
retrieve_seq_from_fasta.pl --format refGene --seqfile genome.fa si_refGene.txt --outfile si_refGeneMrna.fa
table_annovar.pl test.vcf ./ --vcfinput --outfile test --buildver si --protocol refGene --operation g -remove
# 2. To convert the txt result of annovar to hapmap format:
perl vcf2hmp.pl test.vcf test.si_multianno.txt
Put CandiHap.pl
and Phenotype.txt, Your.hmp, genome.gff files in a same dir, then run:
# 3. To run CandiHaplotypes
perl CandiHap.pl -m Your.hmp -f Genome.gff -p Phenotype.txt -g Your_gene_ID
e.g. perl CandiHap.pl -m haplotypes.hmp -f test.gff -p Phenotype.txt -g Si9g49990
perl CandiHap.pl -m haplotypes.hmp -f test.gff -p Phenotype.txt -g Si9g49990 -s 0.5 -u 2000 -d 500 -l 1 -n Structure.txt
The command parameters are:
-m input hmp file name (Must
).
-p input phenotype file name (Must
).
-f input gff file name (Must
).
-g Your gene ID (Must
).
-s p value of wilcox test. default is 1.
-u gene upstream. default is 2000 bp.
-d gene downstream. default is 500 bp.
-l Plot LDheatmap (1) or not (0). default is not 0. require R package "LDheatmap" and "genetics"
.
-n input pop file name and plot haploNet figure. default is NULL. require R package "pegas" and "sf"
.
-k keek all tmp files.
-h this (help) message.
If you want do analysis All gene in LD region of a position
, please run:
perl GWAS_LD2haplotypes.pl -f genome.gff -m Your.hmp -p Phenotype.txt -l LDkb -c Chr:position
e.g. perl GWAS_LD2haplotypes.pl -f test.gff -m haplotypes.hmp -p Phenotype.txt -l 50kb -c 9:54583294
Haplotype analysis for Sanger .ab1
files on Linux system
First of all, please install GATK
(GenomeAnalysisTK.jar), Picard
(picard.jar), bwa
, samtools
, bcftools
, bgzip
, java
and R
(with sangerseqR).
To Install R
and sangerseqR
package
1. Open an internet browser and go to link: https://www.r-project.org
2. Click the 'download R
' link in the middle of the page under 'Getting Started
'.
3. Select a CRAN location (a mirror site
) and click the corresponding link.
4. Click on the 'Download R for Linux
' link at the top of the page.
5. Click on Download 'R-3.5.0
' (or a newer version).
6. Install R and leave all default settings in the installation options.
7. Open R and install the package by command:
if (! requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if (! require("sangerseqR")) BiocManager::install("sangerseqR")
Getting started
Put sanger_CandiHap.sh
, Gene_VCF2haplotypes.pl
, ab1-fastq.pl
and and all .ab1
files in a same dir, then run:
sh sanger_CandiHap.sh Gene_ref.fa
e.g. sh sanger_CandiHap.sh PHYC.txt
For Windows
The installation package integrates all the necessary modules for running independently, meaning no more software installation required.
For Mac OS X
First of all, please install the R software environment, and three packages.
To Install R
for Mac OS X and packages
1. Open an internet browser and go to link: https://www.r-project.org
2. Click the 'download R
' link in the middle of the page under 'Getting Started
'.
3. Select a CRAN location (a mirror site
) and click the corresponding link.
4. Click on the 'Download R for (Mac) OS X
' link at the top of the page.
5. Click on Download 'R-3.5.0.pkg
' (or a newer version).
6. Install R and leave all default settings in the installation options.
7. Open R and install three packages by command:
install.packages(c("ggplot2","agricolae", "ggbeeswarm"))
Contact information
In the future, CandiHap will be regularly updated, and extended to fulfill more functions with more user-friendly options.
For any questions please contact xukai_li@sxau.edu.cn or WeChat ID: Li_XuKai