CandiHap A haplotype analysis toolkit for natural variation study

Manual

CandiHap: a haplotype analysis toolkit for natural variation study.

 CandiHap is a user-friendly local software, that can fast preselect candidate causal SNPs from Sanger or next-generation sequencing data, and report results in table and exquisite vector-graphs within a minute. Investigators can use CandiHap to specify a gene or linkage sites based on GWAS and explore favourable haplotypes of candidate genes for target traits. CandiHap can be run on computers with Windows, Mac OS X, or Linux platforms in graphical user interface or command lines, and applied to any species of plant, animal and microbial. CandiHap is publicly available at https://github.com/xukaili/CandiHap or https://bigd.big.ac.cn/biocode/tools/BT007080 as an open-source software. The analysis of CandiHap can do as the followings:

    1). Convert the VCF file to the hapmap format for CandiHap (vcf2hmp);
    2). Haplotype analysis for a gene (CandiHap);
    3). Haplotype analysis for all genes in the LD regions of a significant SNP one by one (GWAS_LD2haplotypes);
    4). Haplotype analysis for Sanger sequencing data of population variation (sanger_CandiHap.sh).

 

License

Academic users may download and use the application free of charge according to the accompanying license.
Commercial users must obtain a commercial license from Xukai Li.
If you have used the program to obtain results, please cite the following paper:

Xukai Li☯* (李旭凯), Zhiyong Shi☯ (石志勇), Jianhua Gao (高建华), Xingchun Wang (王兴春), Kai Guo* (郭凯). CandiHap: a haplotype analysis toolkit for natural variation study. Molecular Breeding, 2023, 43:21. https://doi.org/10.1007/s11032-023-01366-4 
(☯ Equal contributors; * Correspondence)

 

For Linux system (command lines)

First of all, please install the R software environment, and three packages.

To Install R for Linux and packages

      1. Open an internet browser and go to link: https://www.r-project.org
      2. Click the 'download R' link in the middle of the page under 'Getting Started'.
      3. Select a CRAN location (a mirror site) and click the corresponding link.
      4. Click on the 'Download R for Linux' link at the top of the page.
      5. Click on Download 'R-3.5.0' (or a newer version).
      6. Install R and leave all default settings in the installation options.
      7. Open R and install three packages by command:
          install.packages(c("ggplot2","agricolae", "ggbeeswarm")) 

Getting started

There are mainly three steps included in the CandiHap analytical through command lines, and the test data files can freely download at test_data.zip.
Put vcf2hmp.pl test.gff, test.vcf, and genome.fa files in a same dir, then run:

     # 1. To annotate the vcf by ANNOVAR (Version: 2019-10-24 00:05:27 -0400): 
     gffread  test.gff   -T -o test.gtf
     gtfToGenePred -genePredExt test.gtf  si_refGene.txt
     retrieve_seq_from_fasta.pl --format refGene --seqfile  genome.fa  si_refGene.txt --outfile si_refGeneMrna.fa
     table_annovar.pl  test.vcf  ./  --vcfinput --outfile  test  --buildver  si  --protocol refGene --operation g -remove

     # 2. To convert the txt result of annovar to hapmap format:
     perl  vcf2hmp.pl  test.vcf  test.si_multianno.txt

Put CandiHap.pl and Phenotype.txt, Your.hmp, genome.gff files in a same dir, then run:

     # 3. To run CandiHaplotypes
     perl  CandiHap.pl  -m Your.hmp  -f Genome.gff  -p Phenotype.txt  -g Your_gene_ID
e.g. perl  CandiHap.pl  -m haplotypes.hmp  -f test.gff  -p Phenotype.txt  -g Si9g49990
     perl  CandiHap.pl  -m haplotypes.hmp  -f test.gff  -p Phenotype.txt  -g Si9g49990 -s 0.5 -u 2000 -d 500 -l 1 -n Structure.txt

The command parameters are:

          -m    input hmp file name (Must).

          -p    input phenotype file name (Must).

          -f    input gff file name (Must).

          -g    Your gene ID (Must).

          -s    p value of wilcox test. default is 1.

          -u    gene upstream. default is 2000 bp.

          -d    gene downstream. default is 500 bp.

          -l    Plot LDheatmap (1) or not (0). default is not 0. require R package "LDheatmap" and "genetics".

          -n    input pop file name and plot haploNet figure. default is NULL. require R package "pegas" and "sf".

          -k    keek all tmp files.

          -h    this (help) message.


If you want do analysis All gene in LD region of a position, please run:

     perl  GWAS_LD2haplotypes.pl  -f genome.gff  -m Your.hmp  -p Phenotype.txt   -l LDkb  -c Chr:position
e.g. perl  GWAS_LD2haplotypes.pl  -f test.gff  -m haplotypes.hmp  -p Phenotype.txt  -l 50kb  -c 9:54583294

 

Haplotype analysis for Sanger .ab1 files on Linux system

First of all, please install GATK (GenomeAnalysisTK.jar), Picard (picard.jar), bwa, samtools, bcftools, bgzip, java and R (with sangerseqR).

To Install R and sangerseqR package

      1. Open an internet browser and go to link: https://www.r-project.org
      2. Click the 'download R' link in the middle of the page under 'Getting Started'.
      3. Select a CRAN location (a mirror site) and click the corresponding link.
      4. Click on the 'Download R for Linux' link at the top of the page.
      5. Click on Download 'R-3.5.0' (or a newer version).
      6. Install R and leave all default settings in the installation options.
      7. Open R and install the package by command:
          if (! requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
          if (! require("sangerseqR")) BiocManager::install("sangerseqR")

Getting started

Put sanger_CandiHap.sh, Gene_VCF2haplotypes.pl, ab1-fastq.pl and and all .ab1 files in a same dir, then run:

     sh  sanger_CandiHap.sh  Gene_ref.fa
e.g. sh  sanger_CandiHap.sh  PHYC.txt

 

 

For Windows

The installation package integrates all the necessary modules for running independently, meaning no more software installation required.

 

For Mac OS X

First of all, please install the R software environment, and three packages.

To Install R for Mac OS X and packages

      1. Open an internet browser and go to link: https://www.r-project.org
      2. Click the 'download R' link in the middle of the page under 'Getting Started'.
      3. Select a CRAN location (a mirror site) and click the corresponding link.
      4. Click on the 'Download R for (Mac) OS X' link at the top of the page.
      5. Click on Download 'R-3.5.0.pkg' (or a newer version).
      6. Install R and leave all default settings in the installation options.
      7. Open R and install three packages by command:
          install.packages(c("ggplot2","agricolae", "ggbeeswarm")) 

 

 

Contact information

In the future, CandiHap will be regularly updated, and extended to fulfill more functions with more user-friendly options.
For any questions please contact xukai_li@sxau.edu.cn or WeChat ID: Li_XuKai