URL: | https://www.ebi.ac.uk/metagenomics |
Full name: | |
Description: | MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples. Users can submit their own data for analysis or freely browse all of the analysed public datasets held within the repository. In addition, users can request analysis of any appropriate dataset within the European Nucleotide Archive (ENA). User-submitted or ENA-derived datasets can also be assembled on request, prior to analysis. This resource was previously known as the EBI Metagnomics Portal. |
Year founded: | 2014 |
Last update: | |
Version: | version 5.0 |
Accessibility: | |
Country/Region: | United Kingdom |
Data type: | |
Data object: | |
Database category: | |
Major species: | |
Keywords: |
University/Institution: | European Bioinformatics Institute |
Address: | |
City: | |
Province/State: | |
Country/Region: | United Kingdom |
Contact name (PI/Team): | Robert D. Finn |
Contact email (PI/Helpdesk): | rdf@ebi.ac.uk |
MGnify: the microbiome sequence data analysis resource in 2023. [PMID: 36477304]
The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment. |
MGnify: the microbiome analysis resource in 2020. [PMID: 31696235]
MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations. |
A new genomic blueprint of the human gut microbiota. [PMID: 30745586]
The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota. |
EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. [PMID: 29069476]
EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons. |
Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. [PMID: 29762668]
Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required. |
EBI metagenomics in 2016--an expanding and evolving resource for the analysis and archiving of metagenomic data. [PMID: 26582919]
EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans. |
EBI metagenomics--a new resource for the analysis and archiving of metagenomic data. [PMID: 24165880]
Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive. |