c, The vast majority of OTUs with >1 genome from the GEM catalog were restricted to individual biomes and sub-biomes, although over a third were found in multiple geographic locations. This is exemplified for certain phyla like Thermoplasmatota, where a virus was linked to only 1.6% of the 624 assembled MAGs. 74), to avoid the formation of spurious OTUs that can result from incomplete genomes6. J. Bacteriol. Clades are colored according to the origin of the host information, and new host groups identified exclusively from the GEM catalog are highlighted in bold. Here we applied large-scale genome-resolved metagenomics to recover 52,515 medium- and high-quality metagenome-assembled genomes (MAGs), which form the Genomes from Earth's Microbiomes (GEM) catalog. Author Correction: A genomic catalog of Earth's microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth's continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and . 3b and Supplementary Table 9). As we have illustrated with the large repertoire of new secondary metabolite BGCs and putative virushost connections, we anticipate that the GEM catalog will become a valuable resource for future metabolic and genome-centric data mining and experimental validation. If accurate, this implies that specific chemistry is not limited or amplified by environment, and that most classes of secondary metabolites can be found nearly anywhere. The GEM catalog was constructed from 10,450 metagenomes sampled from diverse microbial habitats and geographic locations (Fig. Crits-Christoph, A. et al. Hua, Z. S. et al. Nature biotechnology. Anantharaman, K. et al. Assembled metagenomes from IMG/M were generated using a variety of quality-control and assembly methods, as described by Huntemann et al.62. We conclude that, in contrast to earlier metagenome binning studies that uncovered vast new lineages of life, the majority of deep-branching lineages are represented by current genome sequences. 1). OLeary NA, et al. (Fig.5b).5b). Emiley A. Eloe-Fadrosh. BMC Bioinformatics 8, 18 (2007). Get the most important science stories of the day, free in your inbox. PMC Earth and Environmental Sciences-Twin Cities; Research output: Contribution to journal Comment/debate peer-review. Bland C, et al. Phylum names were derived from the GTDB, and the numbers to the right represent MAGs from each phylum. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earths continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. Mori M, Roest HJ. To minimize spurious predictions, we dropped arrays with fewer than three spacers, those with nonconserved repeats (<97% average identity to consensus repeat) or those in MAGs containing fewer than four CRISPR-associated proteins. Taxonomic annotations of contigs were obtained based on protein-level alignments against the IMG/M database (downloaded 07 December 2017) using the Last aligner (v876)66 and taking the lowest common ancestor of taxonomically classified genes. Biotechnol. Gains in phylogenetic diversity were relatively consistent across taxonomic groups, but especially high for certain large clades that included Planctomycetota (79% gain), Verrucomicrobiota (68% gain) and Patescibacteria (also referred to as the Candidate Phyla Radiation) (60% gain) (Fig. Given this, we hypothesized that phage genomes are much more likely to encode small genes than microbial genomes. a, MAGs from the current study were compared to 524,046 publicly available reference genomes found in IMG/M and NCBI. Received 2019 Dec 24; Accepted 2020 Sep 28. . Federal government websites often end in .gov or .mil. This revealed that an average of 30.5% (interquartile range (IQR)=5.949.3%) and 14.6% (IQR=0.915.8%) of metagenomic reads per sample were assigned to one or more GEMs or isolate genomes, respectively (Supplementary Fig. 2a,b). A metagenomics roadmap to the uncultured genome diversity in hypersaline soda lake sediments. and JavaScript. Despite an overall 44% increase in phylogenetic diversity of bacteria and archaea, we found little evidence of new deep-branching lineages representing new phyla, consistent with recent studies of microbial diversity30,61. To maximize the number of prophages identified in MAGs, we used VirSorter (v1.0.3)58 to perform de novo prediction, retaining all predictions of categories 4 and 5. Stewart RD, et al. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes. For example, Firmicutes had unusually high numbers of RiPPs (more than half of their BGCs were RiPP clusters), while Thermoplasmatota and Verrucomicrobiota contained relatively high numbers of terpene clusters (68% and 50% of their BGCs, respectively). Up to 500,000 reads from each metagenome were aligned to a database containing 52,515 GEMs and another database containing 151,730 genomes from NCBI RefSeq (release 93)71. Biol. Parks DH, et al. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The sequence Alignment/Map format and SAMtools. These genomes represent 12,556 novel candidate species-level operational taxonomic units (OTUs), representing a resource that captures a broad phylogenetic and functional diversity of uncultivated bacteria and archaea. Generating an ePub file may take a long time, please be patient. DE-AC0205CH11231). Wu D, et al. . The remaining 159,444 reference genomes were clustered into 27,571 additional OTUs based on 95% ANI using MUMmer. First, RefineM (v0.0.20)10 was used to remove contigs with aberrant read depth, GC content and/or tetranucleotide frequencies. The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. BMC Bioinformatics 8, 209 (2007). Comparatively, for a catalog of 270 million genes from 76,000 reference bacterial and archaeal genomes available through IMG/M42, these percentages are approximately 70%, 50% and 20%, respectively. Vast regions of the tree are represented only by uncultivated genomes. This work was also supported as part of the Genomic Sciences Program DOE Systems Biology KBase (award nos. Accelerated profile HMM searches. mBio 17, e002001-16 (2016). Multiple origins of viral capsid proteins from cellular ancestors. 10) due to membership of new archaeal phyla like the Halobacterota, Hadesarchaea (including Archaeoglobi and Syntrophoarchaeia) and lineages within the Crenarchaeota (for example, Thermoprotei, Korarchaeia and Bathyarchaeia)43,44,45,46. For DJR and Microviridae, phylogenies were built as follows: a multiple alignment was computed with MAFFT (v7.407)86 using the einsi mode; the alignment was automatically trimmed with trimAl (v1.4.rev15) using the gappyout option79; and the tree was built with IQ-TREE (v1.5.5)87 with 1,000 ultrafast bootstraps and automatic selection of the evolutionary model. Katoh K, Standley DM. Mol. Nature, 551 (2017), pp. Notably, these analyses also revealed that 75% of the phylogenetic diversity of cataloged microbial diversity is exclusively represented by uncultured genomes (that is, MAGs or SAGs). MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nature 560, 4954 (2018). Although many modular clusters are fragmented, we identified over 3,000 BGC regions >50kb in length and more than 17,000 >30kb. Gray bars indicate percentage of total phylogenetic diversity represented by each taxonomic group (left) or environment (right). Palaniappan, K. et al. The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled unprecedented insights into the ecology and evolution of environmental and host-associated microbiomes. volume39,pages 499509 (2021)Cite this article, An Author Correction to this article was published on 01 April 2021, A Publisher Correction to this article was published on 18 November 2020. Additionally, inoviruses were identified in MAGs based on a custom approach recently developed to identify inovirus-like sequences in the same metagenome assemblies before genome binning85. Arkin, Adam P.; Cottingham, Robert W.; Henry, Christopher S. Marais, Guillaume; Delcher, Arthur L.; Phillippy, Adam M. Wu, Dongying; Jospin, Guillaume; Eisen, Jonathan A. Crits-Christoph, Alexander; Diamond, Spencer; Butterfield, Cristina N. Bland, Charles; Ramsey, Teresa L.; Sabree, Fareedah. Earth and Environmental Sciences-Twin Cities . 5a and Supplementary Table 17), increasing the total number of IMG/VR viruses with a predicted host by >2.5-fold (from 36,976 to 92,872). Vast regions of the tree are represented only by uncultivated genomes. Genome Res. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. A. et al. 195, Issue 17, PLoS Computational Biology, Vol. Assembled metagenomes from IMG/M were generated using a variety of quality-control and assembly methods, as described by Huntemann et al.62. Bioinformatics 31, i35i43 (2015). This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. Nature 568, 505510 (2019). This resulted in identification of 567,316 CRISPR spacers longer than 25bp in 23,851 arrays in 13,540 MAGs. HMMER (v3.1b2)77 was used to identify homologs of the marker genes in the genomes of each OTU using marker-gene-specific bit-score thresholds. & Gudys, A. FAMSA: fast and accurate multiple sequence alignment of huge protein families. DE-AC0205CH11231). In addition to the assembly of microbial genomes, recent studies have highlighted how metagenomes can be mined for novel viral genomes55. To mitigate missing data in incomplete genomes, we pooled homologs across genomes from the same OTU (using a maximum of ten genomes, selected on the basis of CheckM quality) for each of the 30 marker genes. Bethesda, MD 20894, Web Policies Nucleic Acids Res. Unusual biology across a group comprising more than 15% of domain bacteria. We additionally compared MAGs independently assembled by Parks et al.10 for a subset of GEM samples, which further reinforced the reproducibility of our composite genome bins (Supplementary Table 3 and Supplementary Note). Arkin, A. P. et al. Nat. and N.C.K. Methods 14, 10631071 (2017). 12 and Supplementary Table 18). This resulted in a final dataset of 45,599 OTUs representing all GEMs and reference genomes. Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics. Nature Biotechnology, [online] 39 (4), 499-509. http://dx.doi.org/10.1038/s41587-020-0718-6 Plain language summary Recent advances in sequencing technologies have lead to a glut of sequence information much of which has not been fully explored. Biotechnology Engineering & Materials Science 100%. Authors Stephen . . CAS The bar plot displays the percentage of MAGs linked to viruses from each phylum containing 100 or more MAGs. Community structure and metabolism through reconstruction of microbial genomes from the environment. 6), indicating that additional species remain to be discovered across biomes, which is also suggested from the low percentage of mapped reads. This work was conducted by the US DOE Joint Genome Institute, a DOE Office of Science User Facility (contract no. The next four strip charts indicate the environmental distribution of the orders; the last plot indicates the number of MAGs from the GEM catalog recovered from each order. Hunt, Dana E. Cited Authors https://doi.org/10.1038/s41587-020-0718-6. Although several Acidobacteria are known to contain PKS and NRPS clusters, this MAG contains an additional 66 BGC regions, indicating a level of biosynthetic potential that may have been underestimated within this phylum. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. First, we assigned 364,602 reference genomes to one of the 5,472 reference OTUs from the GEM dataset based on >95% ANI over >30% of the genome. Nayfach S 1, Roux S 1, Seshadri R 1, Udwary D 1, Varghese N 1, Schulz F 1, Wu D 1, Paez-Espino D 1, Chen IM 1, Huntemann M 1, Palaniappan K 1, Ladau J 1, Mukherjee S 1, Reddy TBK 1, Nielsen T 1, Kirton E 1, Faria JP 2, . Nat Biotechnol 39, 521 (2021). Wu, D. et al. d, Geographic distribution of MAGs within each biome. Correspondence to Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. Chaumeil, P. A., et al. Using a combination of the two approaches, we predicted connections between 81,449 IMG/VR viruses and 23,082 GEMs (Fig. Arkin AP, et al. Here we present a catalog of 52,515 metagenome-assembled genomes from over 10,000 metagenomes collected from diverse microbiomes to capture extant microbial metabolic and functional potential. 2f). Computational biology and bioinformatics, Microbiology. IMG-ABC v5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase. performed large-scale assembly and binning of all environmental metagenomes available in the NCBI Sequence Read Archive in an unprecedented effort to expand genomic representation of uncultivated lineages10,30. The Microbial Dark Matter (MDM) Phase II study, an extension of the GEBA-MDM project12, contributed the most novelty with 790 new OTUs derived from 1,124 MAGs found in 80 metagenomes. ISME J. Evans, P. N.; Parks, D. H.; Chadwick, G. L. Vavourakis, Charlotte D.; Andrei, Adrian-Stefan; Mehrshad, Maliheh, Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon, Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas. Expanding anaerobic alkane metabolism in the domain of archaea. PubMedGoogle Scholar. Pavn-Carrasco FJ, Gmez-Paccard M, Campuzano SA, Gonzlez-Rouco JF, Osete ML. 2a). Nat. HHS Vulnerability Disclosure, Help Zhu, Q. et al. If material is not included in the articles Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Proc. Approximately 66% of GEM BGCs intersected with one or more contig boundaries, indicating that a majority may be incomplete (Supplementary Fig. All available metagenomic data, bins and annotations are available through the IMG/M portal (https://img.jgi.doe.gov/). Approximately 66% of GEM BGCs intersected with one or more contig boundaries, indicating that a majority may be incomplete (Supplementary Fig. (Fig.3b)3b) which were previously analyzed in recent MAG studies46. Assembling metagenomes, one community at a time. 76, Issue 1, FEMS Microbiology Reviews, Vol. A genomic catalog of Earth's microbiomes. This resulted in identification of 567,316 CRISPR spacers longer than 25bp in 23,851 arrays in 13,540 MAGs. Consistent with this result, metagenomes with the highest k-mer diversity24 tended to have the lowest mapping rates (Spearmans r=0.68; P value=0). Skype 9016488407. estimation activities for 5th grade and S.P.J. ", Search OSTI.GOV for ORCID "0000-0002-8162-1276", Search orcid.org for ORCID "0000-0002-8162-1276", https://doi.org/10.1038/s41587-020-0718-6, A major lineage of non-tailed dsDNA viruses as unrecognized killers of marine bacteria, PILER-CR: Fast and accurate identification of CRISPR repeats, Ecology and exploration of the rare biosphere, Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics, Shifting the genomic gold standard for the prokaryotic species definition, A metagenomics roadmap to the uncultured genome diversity in hypersaline soda lake sediments, https://doi.org/10.1186/s40168-018-0548-7, Genomes OnLine database (GOLD) v.7: updates and new features, Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea, A Catalog of Reference Genomes from the Human Microbiome, Expanding anaerobic alkane metabolism in the domain of Archaea, https://doi.org/10.1038/s41564-019-0364-2, Soil Viruses Are Underexplored Players in Ecosystem Carbon Processing, https://doi.org/10.1128/mSystems.00076-18, Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity, https://doi.org/10.1038/s41396-020-0600-z, 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life, Deep mitochondrial origin outside the sampled alphaproteobacteria, https://doi.org/10.1038/s41586-018-0059-5, Fast and accurate short read alignment with Burrows-Wheeler transform, https://doi.org/10.1093/bioinformatics/btp324, Critical Assessment of Metagenome Interpretationa benchmark of metagenomics software, A thermostable Cas9 with increased lifetime in human plasma, https://doi.org/10.1038/s41467-017-01408-4, Farming, Q fever and public health: agricultural practices and beyond, https://doi.org/10.1186/s13690-017-0248-y, Reconstructing 16S rRNA genes in metagenomic data, https://doi.org/10.1093/bioinformatics/btv231, https://doi.org/10.1016/S0168-6445(00)00040-1, New insights from uncultivated genomes of the global human gut microbiome, https://doi.org/10.1038/s41586-019-1058-x, tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence, FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments, https://doi.org/10.1371/journal.pone.0009490, Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface, https://doi.org/10.1038/s41564-017-0098-y, Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life, MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities, VirSorter: mining viral signal from microbial genomic data, IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase, A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life, Microbial species delineation using whole genome sequences, CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes, A new genomic blueprint of the human gut microbiota, https://doi.org/10.1038/s41586-019-0965-1, An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography, FAMSA: Fast and accurate multiple sequence alignment of huge protein families, BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs, MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability, Genome-centric view of carbon processing in thawing permafrost, https://doi.org/10.1038/s41586-018-0338-1, Natural products from myxobacteria: novel metabolites and bioactivities, IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies, Mash: fast genome and metagenome distance estimation using MinHash, https://doi.org/10.1186/s13059-016-0997-x, Adaptive seeds tame genomic sequence comparison, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Tackling soil diversity with the assembly of large, complex metagenomes, MIBiG 2.0: a repository for biosynthetic gene clusters of known function, Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle, https://doi.org/10.1016/j.cell.2019.01.001, KBase: The United States Department of Energy Systems Biology Knowledgebase, MUMmer4: A fast and versatile genome alignment system, https://doi.org/10.1371/journal.pcbi.1005944, Systematic Identification of Gene Families for Use as Markers for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups, https://doi.org/10.1371/journal.pone.0077033, Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis, https://doi.org/10.1038/s41586-018-0207-y, A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research, https://doi.org/10.1038/s41591-019-0559-3, CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats, Identification of Coxiella burnetii Type IV Secretion Substrates Required for Intracellular Replication and Coxiella-Containing Vacuole Formation, Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity, https://doi.org/10.1128/mSystems.00039-18, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, https://doi.org/10.1093/bioinformatics/btz848, Wide diversity of methane and short-chain alkane metabolisms in uncultured archaea, https://doi.org/10.1038/s41564-019-0363-3, https://doi.org/10.1371/journal.pcbi.1002195, Unusual biology across a group comprising more than 15% of domain Bacteria, trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses, https://doi.org/10.1093/bioinformatics/btp348, Towards a Genome-Based Taxonomy for Prokaryotes, https://doi.org/10.1128/JB.187.18.6258-6264.2005, IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes, Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation, Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen, https://doi.org/10.1038/s41467-018-03317-6, Charting the Complexity of the Marine Microbiome through Single-Cell Genomics, https://doi.org/10.1016/j.cell.2019.11.017, Community structure and metabolism through reconstruction of microbial genomes from the environment, A computational framework to explore large-scale biosynthetic diversity, https://doi.org/10.1038/s41589-019-0400-9, Assembling metagenomes, one community at a time, https://doi.org/10.1186/s12864-017-3918-9, Expansive microbial metabolic versatility and biodiversity in dynamic Guaymas Basin hydrothermal sediments, https://doi.org/10.1038/s41467-018-07418-0, antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline, Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life, https://doi.org/10.1038/s41564-017-0012-7, Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earths biomes, https://doi.org/10.1038/s41564-019-0510-x, IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes, Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system, Insights into the phylogeny and coding potential of microbial dark matter, The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans, Infernal 1.0: inference of RNA alignments, https://doi.org/10.1093/bioinformatics/btp157, The Sequence Alignment/Map format and SAMtools, https://doi.org/10.1093/bioinformatics/btp352, Extraordinary phylogenetic diversity and metabolic versatility in aquifer sediment, A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea, Status of the Archaeal and Bacterial Census: an Update, Atmospheric trace gases support primary production in Antarctic desert surface soil, Multiple origins of viral capsid proteins from cellular ancestors, A Genomic Catalogue of Earths Microbiomes - Introductory KBase Narrative, https://doi.org/10.25982/53247.64/1670777, Coupling Genetic and Chemical Microbiome Profiling Reveals Heterogeneity of Archaeome and Bacteriome in Subsurface Biofilms That Are Dominated by the Same Archaeal Species, https://doi.org/10.1371/journal.pone.0099801, An archaeal symbiont-host association from the deep terrestrial subsurface, https://doi.org/10.1038/s41396-019-0421-0, Metagenomic insights into the taxonomy, function, and dysbiosis of prokaryotic communities in octocorals, https://doi.org/10.1186/s40168-021-01031-y.