Origins of major archaeal clades correspond to gene acquisitions from bacteria

Origins of major archaeal clades correspond to gene acquisitions from bacteria


Play all audios:

Loading...

The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar


mechanisms—ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions1,2,3,4. To


investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein-coding genes of 134 sequenced archaeal genomes in the


context of their homologues from 1,847 reference bacterial genomes. Archaeal-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that


the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to


archaea are more than fivefold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for


metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.


We gratefully acknowledge funding from European Research Council (ERC 232975 to W.F.M.), the graduate school E-Norm of the Heinrich-Heine University (W.F.M.), the DFG (Scho 316/11-1 to P.S.;


SI 642/10-1 to B.S.), and BMBF (0316188A, B.S.). G.L. is supported by an ERC grant (281357 to Tal Dagan), D.B. thanks the Alexander von Humbold Foundation for a Fellowship. Computational


support of the Zentrum für Informations- und Medientechnologie (ZIM) at the Heinrich-Heine University is gratefully acknowledged.


Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany ,


Shijulal Nelson-Sathi, Filipa L. Sousa, Mayo Roettger, Nabor Lozada-Chávez, Thorsten Thiergart & William F. Martin


Mathematisches Institut, Heinrich-Heine University, 40225 Düsseldorf, Germany ,


Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand,


Genomic Microbiology Group, Institute of Microbiology, Christian-Albrechts-Universität Kiel, 24118 Kiel, Germany ,


Institut für Allgemeine Mikrobiologie, Christian-Albrechts-Universität Kiel, 24118 Kiel, Germany ,


Faculty of Chemistry, Biofilm Centre, Molecular Enzyme Technology and Biochemistry, University of Duisburg-Essen, 45117 Essen, Germany ,


Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland,


Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal ,


S.N.-S., F.L.S., M.R., N.L.-C. and T.T. performed bioinformatic analyses; A.J., D.B. and G.L. performed statistical analyses; P.S., B.S., J.O.M. and W.F.M. interpreted results; S.N.-S.,


F.L.S., G.L., J.O.M. and W.F.M. wrote the paper; S.N.-S., G.L. and W.F.M. designed the study. All authors discussed the results and commented on the manuscript.


Each cell in the matrix indicates the number of genes (e-value ≤ 10−10 and ≥ 25% global identity) shared between 134 archaeal and 1,847 bacterial genomes in each pairwise inter-domain


comparison (scale bar at lower right). Archaeal genomes are listed as in Fig. 1. Bacterial genomes are presented in 23 groups corresponding to phylum or class in the GenBank nomenclature: a


= Clostridia; b = Erysipelotrichi, Negativicutes; c = Bacilli; d = Firmicutes; e = Chlamydia; f = Verrucomicrobia, Planctomycete; g = Spirochaete; h = Gemmatimonadetes, Synergisteles,


Elusimicrobia, Dyctyoglomi, Nitrospirae; i = Actinobacteria; j = Fibrobacter, Chlorobi; k = Bacteroidetes; l = Fusobacteria; Thermatogae, Aquificae, Chloroflexi; m = Deinococcus-Thermus; n =


Cyanobacteria; o = Acidobacteria; δ, ε, α, β, γ = Delta, Epsilon, Alpha, Beta and Gamma proteobacteria; P = Thermosulfurobateria, Caldiserica, Chysiogenete, Ignavibacteria. Bacterial genome


size in number of proteins is indicated at the top.


Archaeal export families are sorted according to the reference tree on the left. The figure shows the 391 cases of archaea-to-bacteria export (≥ 2 archaea and ≥ 2 bacteria from one phylum


only), 662 cases of bacterial singleton trees (≥ 3 archaea, one bacterium). The 25,762 clusters were classified into the following categories (Supplementary Table 2): 16,983 archaeal


specific, 3,315 imports, 391 exports, 662 cases of bacterial singletons with ≥ 3 archaea in the tree, 308 cases with three sequences (a bacterial singleton and 2 archaea) in the cluster,


4,074 trees in which archaea were non-monophyletic, and 29 ambiguous cases among trees showing archaeal monophyly. The bacterial taxonomic distribution is shown in the lower panel. Gene


identifiers and trees are given in Supplementary Table 3.


Cumulative distribution functions for scores of tree compatibility with the recipient data set. Values are P values of the two-sided Kolmogorov–Smirnov (KS) two-sample goodness-of-fit test


in the comparison of the recipient (blue) data sets against the imports (green) data set and three synthetic data sets, one-LGT (red), two-LGT (pink) and random (cyan). a, Thermoproteales.


b, Desulfurococcales. c, Sulfolobales. d, Thermococcales. e, Methanobacteriales. f, Methanococcales. g, Thermoplasmatales. h, Archaeoglobales. i, Methanococcales. j, Methanosarcinales. k,


Haloarchaea.


Archaeal families that did not generate monophyly for archaeal sequences in ML trees are plotted according the reference tree on the left, the distribution across bacterial genomes groups is


shown in the lower panel. These trees include 693 cases in which archaea showed non-monophyly by the misplacement of a single archaeal branch. Gene identifiers and trees are given in


Supplementary Tables 4 and 5.


Archaeal families and their homologue distribution in 1,847 bacterial genomes are sorted by archaeal (top) and bacterial (bottom) gene distributions for direct comparison. a–f, Distributions


of archaeal imports sorted by archaeal groups (a) and by bacterial groups (b); distributions of archaeal exports sorted by archaeal groups (c) and by bacterial groups (d); distributions of


archaeal non-monophyletic gene families sorted by archaeal groups (e) and by bacterial groups (f).


Comparison of pairwise Euclidian distance distributions between archaeal real and conditional random gene family patterns using the two-sided Kolmogorov-Smirnov (KS) two-sample


goodness-of-fit test. a, Archaeal specific families: distribution of 2,471 archaeal specific families present in at least 2 and less than 11 groups (top); comparison between real data and


100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately; comparison between real data and conditional random patterns generated


by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0071, middle) or into Euryarchaeota (mean P =  0.02591, bottom). b, Archaeal import families:


distribution of 989 archaeal import families present in at least 2 and less than 11 groups (top). Comparison between real data and 100 conditional random patterns generated by shuffling the


entries within Crenarchaeota and Euryarchaeota separately by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0795, middle); comparison between


real data and random patterns generated by including others (Nanoarchaea,Thaumarchaea and Korarchaeota) into Euryarchaeota (mean P =  0.0098, bottom).


Number of archaeal specific and import families corresponding to each node in the reference tree are shown in the order of ‘specific/imports’. Numbers at internal nodes indicate the number


of archaeal-specific families and families with bacterial homologues that correspond to the reference tree topology. Values at the far left indicate the number of archaeal-specific families


and families with bacterial homologues that are present in all archaeal groups.


Proportion of archaeal families whose distributions are congruent with the reference tree and with all possible trees. Filled circles indicate the proportion of archaeal families that are


congruent to the reference tree allowing no losses (with a single origin) and different increments of losses allowed. Red, blue, green, magenta and black circles represent the proportion of


families that can be explained using a single origin (849, 11.5%), single origin plus 1 loss (22.4%), single origin plus 2 losses (15%), single origin plus 3 losses (13%) and single origin


plus ≥ 4 losses (38%) respectively. Lines indicate the proportion of families that can be explained by each of the 6,081,075 possible trees that preserve euryarchaeote and crenarchaeote


monophyly. Note that on average, any given tree can explain 569 (8%) of the archaeal families using a single origin event in the tree, and the best tree can explain only 1,180 families


(16%). In the present data, 208,019 trees explain the gene distributions better than the archaeal reference tree without loss events, underscoring the discordance between core gene phylogeny


and gene distributions in the remainder of the genome.


This file contains Supplementary Methods and Supplementary References. (PDF 728 kb)


This file contains Supplementary Tables 1-8 and a Supplementary Table Guide. (ZIP 32480 kb)


Anyone you share the following link with will be able to read this content:


Sorry, a shareable link is not currently available for this article.


Lateral (or horizontal) gene transfer between individual cells is recognized as an important factor in genome evolution and species formation in prokaryotes such as cyanobacteria and


proteobacteria. This study of gene of distribution and phylogenies in 134 archaeal genomes shows that origins of the 13 traditionally recognized higher taxa in the archaea correspond to


2,264 group-specific lateral gene acquisitions from bacteria. Transfers from bacteria to archaea are more than fivefold more frequent than vice versa. Gene acquisitions for metabolic


functions from bacteria represent key innovations in the origin of higher archaeal taxa.