Standards
2. Sequencing File
2.1 File Type

This page reviews the submission file formats currently supported by the GSA, and gives guidance to submitters about current file formats and policies regarding GSA submissions.

File types File suffix Applicable platforms Is recommended
Fastq .fastq.gz
.fq.gz
.fastq.bz2
.fq.bz2
All Platforms Yes
Bam .bam All Platforms Yes
Sff .sff LS454
ION_TORRENT
BGISEQ-100
DA8600
Complete Genomics Native .tar.gz
.tar
Complete Genomics
BGISEQ-500
BGISEQ-1000
Solid Native .tar.gz
.tar
ABI SOLID
PacBio_HDF5 .tar
.tar.gz
PacBio RS
PacBio RS II
PacBio RS /PacBio RS II recommend
PacBio Sequel Native .tar
.tar.gz
PacBio Sequel PacBio Sequel recommend
Ab1 .ab1 CAPILLARY
Oxford Nanopore Native .tar
.tar.gz
Oxford Nanapore
10x Genomics .tar
.tar.gz
Bnx .bnx.gz
.bnx.bz2
Bionano Genomics
Fasta .fasta.gz
.fasta.bz2
.fa.gz
.fa.bz2
Helicos Native .tar
.tar.gz
Helicos BioSciences Corporation
2.2 File Formats

Read data can be submitted in several standards and platform specific formats. We recommend that read data submitted in BAM Fastq and BAM format.

Fastq format

Single and paired reads are accepted as Fastq files that meet the following requirements:

1) Quality scores must be in Phred scale. Both ASCII and space delimitered decimal encoding of quality scores are supported. We will automatically detect the Phred quality offset of either 33 or 64.

2) No technical reads (adapters, linkers, barcodes) are allowed.

3) Single reads must be submitted using a single Fastq file and can be submitted with or without read names.

4) Paired reads must be submitted using two Fastq files.

5) Paired read names must have a suffix identifying the first and second read from the pair, for example '/1' and '/2' (regular expression for the reads: "^@([a-zA-Z0-9_-]+:[0-9]+:[a-zA-Z0-9]+:[0-9]+:[0-9]+:[0-9-]+:[0-9-]+) ([12]):[YN]:[0-9]*[02468]:[ACGTN]+$").

6) The first line for each read must start with '@'.

7) The base calls and quality scores must be separated by a line starting with '+'.

8) The Fastq files must be compressed using gzip or bzip2.

9) The regular expression for bases is “^([ACGTNactgn.]*?)$”

BAM format

Submitted BAM files must be readable with Samtools and Picard.

BAM file names are required to end up with the .bam suffix (e.g. ‘a.bam’).