a catalog of biological databases
|Full name:||DNA Data Bank of Japan|
|Description:||DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.|
|University/Institution:||National Institute of Genetics|
|Contact name (PI/Team):||Toshihisa Takagi|
|Contact email (PI/Helpdesk):||firstname.lastname@example.org|
DDBJ Database updates and computational infrastructure enhancement. [PMID: 31724722]
The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The NIG operates the NIG supercomputer as a computational basis for the construction of DDBJ databases and as a large-scale computational resource for Japanese biologists and medical researchers. In order to accommodate the rapidly growing amount of deoxyribonucleic acid (DNA) nucleotide sequence data, NIG replaced its supercomputer system, which is designed for big data analysis of genome data, in early 2019. The new system is equipped with 30 PB of DNA data archiving storage; large-scale parallel distributed file systems (13.8 PB in total) and 1.1 PFLOPS computation nodes and graphics processing units (GPUs). Moreover, as a starting point of developing multi-cloud infrastructure of bioinformatics, we have also installed an automatic file transfer system that allows users to prevent data lock-in and to achieve cost/performance balance by exploiting the most suitable environment from among the supercomputer and public clouds for different workloads.
DDBJ update: the Genomic Expression Archive (GEA) for functional genomics data. [PMID: 30357349]
The Genomic Expression Archive (GEA) for functional genomics data from microarray and high-throughput sequencing experiments has been established at the DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp), which is a member of the International Nucleotide Sequence Database Collaboration (INSDC) with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center collects nucleotide sequence data and associated biological information from researchers and also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center for collecting human data. To automate the submission process, we have implemented the DDBJ BioSample validator which checks submitted records, auto-corrects their format, and issues error messages and warnings if necessary. The DDBJ Center also operates the NIG supercomputer, prepared for analyzing large-scale genome sequences. We now offer a secure platform specifically to handle personal human genomes. This report describes database activities for INSDC and JGA over the past year, the newly launched GEA, submission, retrieval, and analysis services available in our supercomputer system and their recent developments.
DNA Data Bank of Japan: 30th anniversary. [PMID: 29040613]
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups.
DNA Data Bank of Japan. [PMID: 27924010]
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
DNA data bank of Japan (DDBJ) progress report. [PMID: 26578571]
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. [PMID: 25477381]
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
DDBJ progress report: a new submission system for leading to a correct annotation. [PMID: 24194602]
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. This database content is shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). DDBJ launched a new nucleotide sequence submission system for receiving traditional nucleotide sequence. We expect that the new submission system will be useful for many submitters to input accurate annotation and reduce the time needed for data input. In addition, DDBJ has started a new service, the Japanese Genotype-phenotype Archive (JGA), with our partner institute, the National Bioscience Database Center (NBDC). JGA permanently archives and shares all types of individual human genetic and phenotypic data. We also introduce improvements in the DDBJ services and databases made during the past year.
DDBJ new system and service refactoring. [PMID: 23180790]
The DNA data bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) maintains a primary nucleotide sequence database and provides analytical resources for biological information to researchers. This database content is exchanged with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Resources provided by the DDBJ include traditional nucleotide sequence data released in the form of 27 316 452 entries or 16 876 791 557 base pairs (as of June 2012), and raw reads of new generation sequencers in the sequence read archive (SRA). A Japanese researcher published his own genome sequence via DDBJ-SRA on 31 July 2012. To cope with the ongoing genomic data deluge, in March 2012, our computer previous system was totally replaced by a commodity cluster-based system that boasts 122.5 TFlops of CPU capacity and 5 PB of storage space. During this upgrade, it was considered crucial to replace and refactor substantial portions of the DDBJ software systems as well. As a result of the replacement process, which took more than 2 years to perform, we have achieved significant improvements in system performance.
The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments. [PMID: 22110025]
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
DDBJ progress report. [PMID: 21062814]
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3,637,446 entries/2,272,231,889 bases between July 2009 and June 2010. A highlight of the released data was archive datasets from next-generation sequencing reads of Japanese rice cultivar, Koshihikari submitted by the National Institute of Agrobiological Sciences. In this period, we started a new archive for quantitative genomics data, the DDBJ Omics aRchive (DOR). The DOR stores quantitative data both from the microarray and high-throughput new sequencing platforms. Moreover, we improved the content of the DDBJ patent sequence, released a new submission tool of the DDBJ Sequence Read Archive (DRA) which archives massive raw sequencing reads, and enhanced a cloud computing-based analytical system from sequencing reads, the DDBJ Read Annotation Pipeline. In this article, we describe these new functions of the DDBJ databases and support tools.
DDBJ launches a new archive database with analytical tools for next-generation sequence data. [PMID: 19850725]
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1,701,110 entries/1,116,138,614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the 'DDBJ Read Archive' (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the 'DDBJ Read Annotation Pipeline' was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users' research and provide easier access to DDBJ databases.
DDBJ dealing with mass data produced by the second generation sequencer. [PMID: 18927114]
DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) collected and released 2 368 110 entries or 1 415 106 598 bases in the period from July 2007 to June 2008. The releases in this period include genome scale data of Bombyx mori, Oryzas latipes, Drosophila and Lotus japonicus. In addition, from this year we collected and released trace archive data in collaboration with National Center for Biotechnology Information (NCBI). The first release contains those of O. latipes and bacterial meta genomes in human gut. To cope with the current progress of sequencing technology, we also accepted and released more than 100 million of short reads of parasitic protozoa and their hosts that were produced by using a Solexa sequencer.
DDBJ with new system and face. [PMID: 17962300]
DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1 880 115 entries or 1 134 086 245 bases in the period from July 2006 to June 2007. The released data contains the high-throughput cDNAs of cricket and high-quality draft genome of medaka among others. Our computer system has been upgraded since March 2007. Another new aspect is an efficient data retrieval tool that has recently been equipped and served at DDBJ. It is called All-round Retrieval for Sequence and Annotation, which enables the user to search for keywords also in the Feature/Qualifier of the International Nucleotide Sequence Database Collaboration (http://www.insdc.org/). We will also replace our home page with a more efficient one by the end of 2007.
DDBJ working on evaluation and classification of bacterial genes in INSDC. [PMID: 17108353]
DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) newly collected and released 12,927,184 entries or 13,787,688,598 bases in the period from July 2005 to June 2006. The released data contain honeybee expressed sequence tags (ESTs), re-examined and re-annotated complete genome data of Escherichia coli K-12 W3110, medaka WGS and human MGA. We also systematically evaluated and classified the genes in the complete bacterial genomes submitted to the International Nucleotide Sequence Database Collaboration (INSDC, http://insdc.org) that is composed of DDBJ, EMBL Bank and GenBank. The examination and classification selected 557,000 genes as reliable ones among all the bacterial genes predicted by us.
DDBJ in preparation for overview of research activities behind data submissions. [PMID: 16381940]
In the past year, DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1,956,826 entries or 1,741,313,111 bases. The released data include approximately 90,000 ESTs and cDNAs of Macaca fascicularis, and 280 million bases of mouse GSS. In addition to the data collection, we have indexed the submitted data to the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org) to classify the entries into research projects behind data submissions. They are expected to be useful to the data submitters and users for enhancing the data submission, retrieval and systematic data analyses at INSDC. The results of indexing also allow one to grasp research projects in life sciences that promoted and produced the DNA sequences submitted to INSDC.
DDBJ in collaboration with mass-sequencing teams on annotation. [PMID: 15608189]
In the past year, we at DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp) collected and released 1,066,084 entries or 718,072,425 bases including the whole chromosome 22 of chimpanzee, the whole-genome shotgun sequences of silkworm and various others. On the other hand, we hosted workshops for human full-length cDNA annotation and participated in jamborees of mouse full-length cDNA annotation. The annotated data are made public at DDBJ. We are also in collaboration with a RIKEN team to accept and release the CAGE (Cap Analysis Gene Expression) data under a new category, MGA (Mass Sequences for Genome Annotation). The data will be useful for studying gene expression control in many aspects.
DDBJ in the stream of various biological data. [PMID: 14681352]
In the past year we at DDBJ (http://www.ddbj.nig. ac.jp) have made a steady increase in the number of data submissions with a 50.6% increment in the number of bases or 46.5% increment in the number of entries. Among them the genome data of man, ascidian and rice hold the top three. Our activity has extended to providing a tool that enables sequence retrieval using regular expressions, and to launching our SOAP server and web services to facilitate the acquisition of proper data and tools from a huge number of biological data resources on websites worldwide. We have also opened our public gene expression database, CIBEX.
DNA Data Bank of Japan (DDBJ) in XML. [PMID: 12519938]
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has collected and released more entries and bases than last year. This is mainly due to large-scale submissions from Japanese sequencing teams on mouse, rice, chimpanzee, nematoda and other organisms. The contributions of DDBJ over the past year are 17.3% (entries) and 10.3% (bases) of the combined outputs of the International Nucleotide Sequence Databases (INSD). Our complete genome sequence database, Genome Information Broker (GIB), has been improved by incorporating XML. It is now possible to perform a more sophisticated database search against the new GIB than the ordinary BLAST or FASTA search.
DNA Data Bank of Japan (DDBJ) for genome scale research in life science. [PMID: 11752245]
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools.
DNA data bank of Japan (DDBJ) in collaboration with mass sequencing teams. [PMID: 10592172]
We at DDBJ (http://www.ddbj.nig.ac.jp) process and publicise the massive amounts of data submitted mainly by Japanese genome projects and sequencing teams. It is emphasised that the collaboration between data producing teams and the data bank is crucial in carrying out these processes smoothly. The amount of data submitted in 1999 is so large that it alone exceeds the total amount submitted in the preceding 10 years. To cope with this situation, we have developed tools not only for processing such massive amounts of data but also for efficiently retrieving data on demand.
DNA Data Bank of Japan dealing with large-scale data submission. [PMID: 9847134]
The DNA Data Bank of Japan (DDBJ) (http//:www.ddbj.nig.ac.jp) has developed a software system for mass submissions to cope with a recent expansion of EST and genome data submissions. The system is composed of four parts, the WWW data submission, large-scale submission, submission management and storing. Using this system one can submit data on a large number of sequences or a very long sequence while checking the consistency between the annotation and sequence without much effort. DDBJ has received large scale data of Homo sapiens, Arabidopsis and Pyrococcus from Japanese researchers who made full use of the new submission system.
Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. [PMID: 9694985]
The DNA Data Bank of Japan (DDBJ) has developed a new DNA database system with a new schema design to accommodate rapid change and growth of requirements on the system. The new schema and systems were created using an object-oriented design approach. The design was accomplished in accordance with ANSI/SPARC three-level schema architecture. First, the conceptual schema was designed using a functional model named AIS (associative information structure) and was visualized in extended diagram format. The model is a natural extension of an ER (entity relationship) model and describes real-world objects in binary associations between entities with the concept of order. Second, the schema was mapped on a relational database as a physical schema. All details are concentrated in this schema and the layer lying above enjoys physical independence. Finally, as another layer, external modeling was introduced for the database applications interface. It provides set-at-a-time basis operations and was implemented as a C++ object-oriented library. On this common framework of a new schema, a new annotator's workbench named Yamato II and a World Wide Web (WWW) submission system named Sakura have been successfully developed to improve drastically daily transactions in the DDBJ. Sakura is available at the following address: http://sakura.ddbj.nig.ac.jp. email@example.com
DNA Data Bank of Japan at work on genome sequence data. [PMID: 9399792]
We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data.
DNA Data Bank of Japan in the age of information biology. [PMID: 9016494]
DNA Data Bank of Japan (DDBJ) began its activities in 1986 in collaboration with EMBL in Europe and GenBank in the United States. DDBJ developed a data submission tool called Sakura, by which researchers can submit their newly sequenced data on WWW from every corner of the world. The data bank also built a database management system (Yamato II), incorporating the techniques and functions of the object-oriented database, in order to efficiently process the data it has collected. A number of research activities in information biology are also going on at DDBJ. Two such activities are also briefly introduced in this report.