• 4.4 Variation annotation
  • Software:VEP
    Software was downloaded from https://codeload.github.com/Ensembl/ensembl-tools/zip/release/84, and locally installed.
    Input file: the GATK vcf. format file was taken as input
    Caches and databases:
    1. The pre-built caches can be downloaded from ftp://ftp.ensembl.org/pub/release-84/variation/VEP/, and stored in the directory ./vep .
    2. For species that don't have a publicly available cache, it is possible to build a VEP cache using the gtf2vep.pl script. This requires a GTF or GFF file and a FASTA reference sequence.
     Command:

     perl gtf2vep.pl -i my_species_genes.gtf -f my_species_seq.fa -d 84 -s my_species

     VEP parameters and command:

     perl variant_effect_predictor.pl -offline -i my_species_variants.vcf -s my_species

  • Consequence Type and Effects
    Consequence Type Effect SO accession SO description
    transcript_ablation HIGH SO:0001893 A feature ablation whereby the deleted region includes a transcript feature
    splice_acceptor_variant HIGH SO:0001574 A splice variant that changes the 2 base region at the 3' end of an intron
    splice_donor_variant HIGH SO:0001575 A splice variant that changes the 2 base region at the 5' end of an intron
    stop_gained HIGH SO:0001587 A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript
    frameshift_variant HIGH SO:0001589 A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three
    stop_lost HIGH SO:0001578 A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript
    start_lost HIGH SO:0002012 A codon variant that changes at least one base of the canonical start codo
    transcript_amplification HIGH SO:0001889 A feature amplification of a region containing a transcript
    splice_region_variant LOW SO:0001630 A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron
    incomplete_terminal_codon_variant LOW SO:0001626 A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed
    stop_retained_variant LOW SO:0001567 A sequence variant where at least one base in the terminator codon is changed, but the terminator remains
    synonymous_variant LOW SO:0001626 A sequence variant where there is no resulting change to the encoded amino acid
    inframe_insertion MODERATE SO:0001821 An inframe non synonymous variant that inserts bases into in the coding sequenc
    inframe_insertion MODERATE SO:0001822 An inframe non synonymous variant that deletes bases from the coding sequenc
    missense_variant MODERATE SO:0001583 A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved
    protein_altering_variant MODERATE SO:0001818 A sequence_variant which is predicted to change the protein encoded in the coding sequence
    regulatory_region_ablation MODERATE SO:0001894 A feature ablation whereby the deleted region includes a regulatory region
    coding_sequence_variant MODIFIER SO:0001580 A sequence variant that changes the coding sequence
    mature_miRNA_variant MODIFIER SO:0001620 A transcript variant located with the sequence of the mature miRNA
    5_prime_UTR_variant MODIFIER SO:0001623 A UTR variant of the 5' UTRA
    3_prime_UTR_variant MODIFIER SO:0001624 A UTR variant of the 3' UTR
    non_coding_transcript_exon_variant MODIFIER SO:0001792 A sequence variant that changes non-coding exon sequence in a non-coding transcript
    intron_variant MODIFIER SO:0001627 A transcript variant occurring within an intron
    NMD_transcript_variant MODIFIER SO:0001621 A variant in a transcript that is the target of NMD
    non_coding_transcript_variant MODIFIER SO:0001619 A transcript variant of a non coding RNA gene
    upstream_gene_variant MODIFIER SO:0001631 A sequence variant located 5' of a gene
    downstream_gene_variant MODIFIER SO:0001632 A sequence variant located 3' of a gene
    TFBS_ablation MODIFIER SO:0001892 A feature ablation whereby the deleted region includes a transcription factor binding site
    TFBS_amplification MODIFIER SO:0001892 A feature amplification of a region containing a transcription factor binding site
    TF_binding_site_variant MODIFIER SO:0001782 A sequence variant located within a transcription factor binding site
    regulatory_region_amplification MODIFIER SO:0001891 A feature amplification of a region containing a regulatory region
    feature_elongation MODIFIER SO:0001907 A sequence variant located within a regulatory region
    regulatory_region_variant MODIFIER SO:0001566 A sequence variant located within a regulatory region
    feature_truncation MODIFIER SO:0001906 A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence
    intergenic_variant MODIFIER SO:0001628 A sequence variant located in the intergenic region, between genes