Variation Identification

Functions

      ‘Fastq-to-Variants’ can align raw NGS sequencing data to the reference genome of 2019nCoV, identify 2019nCoV genetic sequences from the sequencing sample(s), analyze the degree of genomic coverage, sequencing depth, sequencing error rates, detect and annotate the SNPs or Indels.

Upload Reference FASTA

NC_045512

Upload FASTQ Files

Sequencing Type:

Upload Single-end Sequencing file

Sample Name:

Sample Name

This process currently only supports Illumina sequencing data. Processing pipelines for analyzing the third-generation sequencing data (PacBio /Nanopore) will be deployed later. To reduce upload time, ‘gzip’ compressed file is recommended.

Or Input GSA's CRR Accession

CRR Accession

SARS-CoV-2 data list in GSA: https://ngdc.cncb.ac.cn/gsa/browse/run/?tag=Coronaviridae

Email (Results will be notified via email when the calculating time is long)
Email
 
Subject
Run
Reminder: Running tasks: , Tasks in queue: ; Refer to the table below for the estimated processing time.
Data Processing Pipeline
Example Result
Help
1. Reference Running Time

      Reference running time is tested using real data set when the server system is idle (not including upload time). Running time of actual tasks depend on the workload state of the server system, data volume, data quality and so on.

Data1Data2Data3Data4Data5
Data Volume118Mb1.0Gb1.5Gb2.2Gb8.0Gb
Calculating Time*0m37s0m55s1m10s1m36s3m42s
AccessionSRR11247077SRR11092064 SRR11092057SRR11092058SRR10971381

*Run on 24 CPU cores

Functions

This tool can detect mutation sites in the virus genome sequence. SARS-CoV-2 (NC_045512) and your upload fasta file are supported as reference for the online variation identification. The input file can be the whole genome sequence of virus(es) in the format of FASTA, and support for partial sequence(s) also.
You must upload single FASTA format file contain one virus genomic sequence or multiple sequences.

Upload Reference FASTA

NC_045512
Upload file
NC
SC
Email (Results will be notified via email when the calculating time is long)
Email
 
Subject
Run
Reminder: Running tasks: , Tasks in queue: ; Refer to the table below for the estimated processing time.
Input file format (FASTA)
>BetaCoV/Wuhan/IVDC-HB-01/2019 EPI_ISL_402119 ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGA TCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG TGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAAC TCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGC ACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTG ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGA TCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG TGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAAC TCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGC ACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTG
Parameter specification

1). NC - Maximum number of N's contained in the input fasta sequence, the default value is 15, Discard the sequences when number of N's of genome greater than NC number;
2). SC - Maximum number of the degenerate bases contained in the input fasta sequence, the default value is 50, Discard the sequences when number of degenerate bases of genome greater than SC number;