Table Of ContentTheISMEJournal(2016)10,2702–2714 OPEN
©2016InternationalSocietyforMicrobialEcology Allrightsreserved 1751-7362/16
www.nature.com/ismej
ORIGINAL ARTICLE
RubisCO of a nucleoside pathway known from
Archaea is found in diverse uncultivated phyla in
bacteria
Kelly C Wrighton1,7, Cindy J Castelle2,7, Vanessa A Varaljay1, Sriram Satagopan1,
Christopher T Brown3, Michael J Wilkins1,4, Brian C Thomas2, Itai Sharon2,
Kenneth H Williams5, F Robert Tabita1 and Jillian F Banfield2,6
1Department of Microbiology, The Ohio State University, Columbus, OH, USA; 2Department of Earth and
PlanetaryScience,UCBerkeley,Berkeley,CA,USA;3DepartmentofPlantandMicrobialBiology,UCBerkeley,
Berkeley, CA, USA; 4School of Earth Sciences, The Ohio State University, Columbus, OH, USA; 5Earth
Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA and 6Department of
Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
MetagenomicstudiesrecentlyuncoveredformII/IIIRubisCOgenes,originallythoughttoonlyoccurin
archaea,fromuncultivatedbacteriaofthecandidatephylaradiation(CPR).TherearenoisolatedCPR
bacteriaandtheseorganismsarepredictedtohavelimitedmetaboliccapacities.Hereweexpandthe
knowndiversityofRubisCOfromCPRlineages.WereportaformofRubisCO,distantlysimilartothe
archaeal form III RubisCO, in some CPR bacteria from the Parcubacteria (OD1), WS6 and
Microgenomates (OP11) phyla. In addition, we significantly expand the Peregrinibacteria (PER) II/III
RubisCO diversity and report the first II/III RubisCO sequences from the Microgenomates and WS6
phyla.Toprovideametaboliccontextfor theseRubisCOs,wereconstructednear-complete(493%)
PER genomes and the first closed genome for a WS6 bacterium, for which we propose the phylum
nameDojkabacteria.GenomicandbioinformaticanalysessuggestthattheCPRRubisCOsfunctionin
a nucleoside pathway similar to that proposed in Archaea. Detection of form II/III RubisCO and
nucleosidemetabolism gene transcripts fromaPER supportsthe operation of this pathwayin situ.
WedemonstratethatthePERformII/IIIRubisCOiscatalyticallyactive,fixingCO tophysiologically
2
complement phototrophic growth in a bacterial photoautotrophic RubisCO deletion strain. We
proposethattheidentificationoftheseRubisCOsacrossaradiationofobligatelyfermentative,small-
celled organisms hints at a widespread, simple metabolic platform in which ribose may be a
prominentcurrency.
TheISME Journal (2016) 10,2702–2714; doi:10.1038/ismej.2016.53; publishedonline3May2016
Introduction (CPR),amonophyleticgroupofatleast35phylathat
accounts for 415% of all bacterial diversity (Brown
Thevastmajorityoftheorganismsintheenvironment
et al., 2015). Metagenomic sampling of almost 800
have not been cultivated, obscuring our knowledge of
CPR bacteria suggested that members of this radia-
their physiology. Recent metagenomic investigations
tion consistently have small genomes with many
have revealed the presence of a large diversity of
metabolic limitations, including lack of an electron
uncultivated organisms in marine and terrestrial
transport chain, no more than a partial tricarboxylic
subsurface environments (Castelle et al., 2013, 2015;
acid cycle and mostly incomplete nucleotide and
Lloydetal.,2013;Bakeretal.,2015).Forexample,ina
amino-acid biosynthesis pathways. Using metabolic
metagenomic study of a shallow alluvial aquifer, we
predictions from complete and near-complete gen-
determined that many of the uncultivated bacteria
omes (Wrighton et al., 2012), we inferred that
were associated with the candidate phyla radiation
members of the CPR are fermenters whose primary
biogeochemical impact is primarily on subsurface
organic carbon and hydrogen cycling (Wrighton
Correspondence: JF Banfield, Department of Earth and Planetary
Sciences,andDepartmentofEnvironmentalScience,Policy,and et al., 2014). However, many of the genes in these
Man, University of California at Berkeley, 336 Hilgard Hall, genomesremainpoorlyannotatedandthemetabolic
Berkeley,CA94720,USA. platformofCPRbacteriaremainsuncertain.Further,
E-mail:[email protected]
no transcription and enzyme activities have been
7Theseauthorscontributedequallytothiswork.
validated, hindering knowledge of the actual reac-
Received 25 August 2015; revised 29 February 2016; accepted
4March2016;publishedonline3May2016 tions catalyzed by these organisms.
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2703
The incorporation of CO into organic carbon is a experiments were performed by adding approxi-
2
critical step in the carbon cycle. Ribulose-1,5- mately 15mM acetate to the groundwater over the
bisphophate carboxylase-oxygenase (RubisCO) is course of 72 days to maintain an average in situ
the most abundant enzyme on earth and is integral concentrationof~3mMacetateduringpeakstimula-
to the fixation of carbon dioxide. Currently four tion. Six microbial community samples (A–F) were
forms of RubisCO can fix atmospheric carbon collected over 93 days, including 3 days prior to
dioxide (I, II, II/III and III) (Tabita et al., 2007; acetate stimulation and after acetate addition ceased
2008). Forms I and II are used by plants, algae and (amendment ceased on day 72). Ferrous iron and
some chemoautotrophic and phototrophic bacteria sulfide concentrations were analyzed immediately
for CO fixation during primary production via the after sampling using the HACH phenanthroline assay
2
Calvin Benson Bassham (CBB) cycle. Forms III and and methylene blue sulfide reagent kit, respectively
II/III, found primarily in archaea, enable light- (HACH, Loveland, CO, USA). Acetate and sulfate
independent CO incorporation into sugars derived concentrations were determined using a Dionex
2
from nucleotides like adenosine monophosphate ICS-2100 ion chromatograph (Sunnyvale, CA, USA)
(AMP) (Sato et al., 2007; Tabita et al., 2007). Form equipped with an AS-18 guard and analytical column
II/III RubisCOs, which were primarily known in (Williams et al., 2011). Microbial cells from pumped
methanogens, have greater overall similarity to groundwater that passed through a 1.2-μm prefilter
bacterialFormIIRubisCObuthavespecificresidues, (Pall,PortWashington,NY,USA)wereretainedon0.2-
structural and catalytic features that more closely and 0.1-μm filters (Pall). Filters were flash-frozen in
resemble archaeal form III (Alonso et al., 2009). liquid nitrogen immediately upon collection.
Previously, we reported the first evidence of form
II/III RubisCO genes in partial genomes from the
Peregrinibacteria(PER)phylumintheCPRradiation Nucleic acid extraction and sequencing
(Wrighton et al., 2012). Additionally, three bacterial Approximatelya1.5-gsamplefromeachfrozenfilter
formII/IIIRubisCOswererecoveredfromgenomesof was used for genomic DNA extraction. DNA was
membersoftheSR1phylumthatalsoareinferredto extracted using the PowerSoil DNA Isolation Kit
be fermentative (Wrighton et al., 2012; Campbell (MoBio Laboratories, Inc., Carlsbad, CA, USA).
et al., 2013; Kantor et al., 2013). The presence of Illumina HiSeq 2000 2×150 paired-end sequencing
these annotated RubisCO genes in small genomes was conducted by the Joint Genome Institute on
thathousefewancillarygenessuggeststhatRubisCO extractedDNA.SequencingamountsforsamplesA–
may have an important role in the metabolism of F from 0.1- and 0.2-μm filters ranged from 9.5 to
some of these bacteria. 40.7Gbp, as described in Castelle et al. (2015). RNA
To more comprehensively explore the diversity was extracted using Invitrogen TRIzol Reagent
and potential role of CPR RubisCOs, we reproduced (Carlsbad, CA, USA) followed by genomic DNA
the prior experiment from which we first identified removal and cleaning using Qiagen RNase-Free
bacterial form II/III RubisCO sequences (Wrighton DNase Set Kit (Valencia, CA,USA) andQiagen Mini
et al., 2012). Size filtration was used to increase the RNeasy Kit (Valencia, CA, USA). An Agilent 2100
relative abundance of the small-celled CPR organ- Bioanalyzer (Santa Clara, CA, USA) was used to
isms in samples used for sequencing (Luef et al., assess the integrity of the RNA samples. Only RNA
2015).HereweinvestigateCPRgenomesthatencode samples having RNA Integrity Number between 8
RubisCOgenes,reportanewRubisCOsequencetype and 10 were used. The Applied Biosystems SOLiD
diverged from Archaeal form III and propose a Total RNA-Seq Kit (Invitrogen/LifeTechnologies,
broader metabolic context in which these genes Carlsbad, CA, USA) generated the cDNA template
occur. By detecting in situ transcripts and through library. The SOLiD EZ Bead system was used to
directly cloning one CPR form II/III RubisCO perform emulsion clonal bead amplification to
sequence into an expression host, we suggest the generate bead templates for SOLiD platform sequen-
potentialfunctionalityofthisbacterialenzyme.This cing.Samplesweresequencedonthe5500XLSOLiD
study represents the first enzymatic activity demon- platform (LifeTechnologies). The 75-bp sequences
strated from these broadly distributed but metaboli- produced were mapped in color space using the
cally enigmatic candidate phyla bacteria. SOLiD LifeScope software version 2.5 (Life Techno
logies/Invitrogen) using the default parameters
against the reference genome set of 2302715
Materials and methods separate gene FASTA entries. Corresponding GTF
fileswerebuilttorecordthepositionofeachgeneon
Groundwater field experiment and sample collection the set of artificial chromosomes.
The field experiment was carried out between 25
Augustand12December2011attheUSDepartment
of Energy Rifle Integrated Field Research Challenge Genome assembly, binning and functional annotations
site adjacent to the Colorado River, CO, USA as Methods for assembly, binning and annotation
recentlydescribed(Brownetal.,2015;Castelleetal., are described from methods described in detail
2015; Luef et al., 2015). In brief, biostimulation in prior publications (Wrighton et al., 2012;
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2704
Castelle et al., 2015). Reads were quality trimmed We then used these RubisCO sequences to recruit
usingSickle(https://github.com/najoshi/sickle)with additionalCPRsequencesfromcurrentlyunreported
default settings. Trimmed paired-end reads were groundwater and sediment genomic data sets from
assembled using IDBA_UD (Peng et al., 2012) with the same field site.
default parameters. Genes on scaffolds 45kb in
length were predicted using Prodigal (Hyatt et al.,
2012)withthemetagenomeprocedure(-pmeta),and 16S rRNA gene recovery and phylogenetic analyses
then USEARCH (–ublast) (Edgar, 2010) was used to 16S rRNA gene sequences were identified in
search protein sequences against UniRef90 (Suzek assembledscaffoldsbasedonHiddenMarkovModel
et al., 2007), KEGG (Kanehisa and Goto, 2000; searches using the cmsearch program from the
Kanehisa et al., 2012) and an in-house database Infernal package (cmsearch –hmmonly –acc –noali
composed of open reading frames from candidate –T –1) (Nawrocki et al., 2009). This protocol is
phyla genomes. The in-house database includes described in detail in Brown et al. (2015). Searches
previously published genomes (Wrighton et al., were conducted using the manually curated struc-
2012; Castelle et al., 2013; Hug et al., 2013; Kantor turalalignmentofthe16SrRNAprovidedwithSSU-
etal.,2013;Wrightonetal.,2014)andgenomesfrom Align (Nawrocki, 2009). Sequences corresponding
ongoing work and primarily aiding in identifying with CPR genomes were aligned using SSU-Align
putative CPR scaffolds. For each scaffold, we along with the best hits of these sequences in the
determined the GC content, coverage and profile of Silvadatabase(v1.2.11)(Quastetal.,2013),acurated
phylogeneticaffiliationbasedonthebesthitforeach setofreferencesequencesassociated with candidate
gene against the Uniref90 (Suzek et al., 2007) phyla of interest, and a set of archaea sequences to
database. Scaffolds were binned to specific organ- serve as a phylogenetic root. Sequences with
isms by using coverage across the samples, phyloge- ⩾800bpalignedinthe1582-bpalignmentwereused
netic identity and GC content, both automatically to infer a maximum-likelihood phylogeny using
with the ABAWACA algorithm and manually using RAxML (Stamatakis, 2014) with the GTRCAT model
ggKbase (http://ggkbase.berkeley.edu/). ABAWACA of evolution selection via ProTest and 100 bootstrap
is an algorithm that generates preliminary genome re-samplings.
bins based on different characteristics of assembled
scaffolds (https://github.com/CK7/abawaca). Six
genome bins generated by ABAWACA were manu- Protein phylogenetic analyses and modeling
ally inspected within ggKbase. Binning purity was For single protein trees (for example, RubisCO large
confirmed using an Emergent Self-Organizing Map subunit), the phylogenetic pipeline was performed
(Dicketal.,2009).Ourprimarymethodforassessing as previously described (Wrighton et al., 2012).
genome completeness was based on the presence or Briefly, protein sequences were aligned using MUS-
absence of orthologous genes representing a core CLE version 3.8.31 (Edgar, 2004) with default
gene set that are widely conserved as single-copy settings. Problematic regions of the alignment were
genes among bacterial candidate phyla, but we also removed using very liberal curation standards with
assessed global markers. All single copy and meta- the program GBlocks (Talavera and Castresana,
bolic analyses reported here and summarized in the 2007). Alignments with and without GBlocks were
Supplementary Figures are reported in ggKbase confirmed manually. Best models of amino-acid
(FASTA files can be accessed via links provided in substitution for each protein alignment were esti-
the Supplementary Figures). Manual assembly cura- mated using ProtTest3 (Abascal et al., 2005). Phylo-
tion improved some genome bins by extending and genetic trees were generated using RAxML with the
joining scaffolds, removing local scaffolding errors PROTCAT setting for the rate model and the best
and, in some cases, bringing in small previously model of amino-acid substitution specified by
unbinned fragments. One genome was curated to ProtTest. Nodal support was estimated based on
closure and completion. 100 bootstrap replications using the rapid boot-
strapping option implemented in RAxML. Three-
dimensionalstructurepredictionsweregeneratedby
Inventory of the RubisCO genes SWISS-MODEL (an automated protein homology-
All putative RubisCO genes on genome fragments modeling server) and i-tasser based on protein
assembled from the 12 separate groundwater filter alignment and secondary structure prediction. The
samples were recovered first by BLAST search alignment mode was based on a user-defined target-
(Altschul et al., 1990) against an in-house database template alignment. Conservation of key catalytic
that included candidate phyla sequences from prior residues and the secondary structure foreach model
studies (Wrightonetal.,2012;Campbelletal.,2013; was confirmed by manual inspection.
Kantor et al., 2013; Castelle et al., 2015). The
taxonomic affiliation of these putative RubisCO
CPR scaffolds was confirmed based on phylogenetic Plasmids and cloning
analysis of genes included on the same genome Rifle metagenomic DNA wasusedasthe templateto
fragmentastheRubisCOorinthesame genomebin. amplify the exact coding regions for the RubisCO
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2705
genes from PER genome PER_GWF2_38_29 (PER-2), biparental matings (Smith and Tabita, 2003;
the OD-1 genome GWE1_OD1_1_1_39 and WS6 Satagopan et al., 2009, 2014). Plasmid-complemented
genomesGWC1_1_418andGWC1_33_22_1_93.Primer strain SB I/II− colonies were first selected on peptone
sequencesusedforRubisCOamplificationareincluded yeast extract plates followed by streaking onto Ormer-
(Supplementary Table S1). PCR was performed using od’s minimal medium (Ormerod et al., 1961) agar
the PrimeStar GXL (Clontech, Mountain view, CA, plates with no added organic carbon. To test for
USA) high-fidelity polymerase. PCR products for all photoautotrophic growth complementation, plates
geneswereclonedbetweenNdeIandBamHIsitesin were incubated in illuminated anaerobic jars, which
plasmid pet11a (Novagen/EMD Millipore, Billerica, were periodically flushed with a gas mixture of 5%
MA, USA), which does not incorporate an N-term- CO /H aspreviouslydescribed(Satagopanetal.,2009,
2 2
inal hexa-histadine tag. For the PER, the PCR 2014). Anaerobic CO -dependent photoautotrophic
2
product was also cloned between KpnI and BamHI growth was obtained using Ormerod’s minimal med-
sites in plasmid p80, a high copy plasmid 3716- iumagarplatesinasealedjarflushedwitha5%CO /
2
based vector (a derivative of pLO11 obtained from 95%H gasmixture.Coloniesonminimalplatesunder
2
Oliver Lenz and Bärbel Friedrich, Berlin, Germany), these conditions were used to inoculate liquid photo-
which contained the Rhodospirillum rubrum CbbR/ autotrophic cultures bubbled continuously with 5%
cbbM promoter region, for expression in vivo. This CO /H for growth analysis; cells were harvested
2 2
promoterhasbeenusedpreviouslyinthebroad-host anaerobically for RubisCO activity assays, as
range vector pRPS-MCS3 to drive the expression of described above.
RubisCO genes to complement RubisCO deletion
strain Rhodobacter capsulatus strain SB I/II (Smith
and Tabita, 2003). Constructs for the coding regions In vitro RubisCO assays
of archaeal form III Archaeoglobus fulgidus RbcL2 RubisCO assays with activated enzyme preparations
(Kreel and Tabita, 2007), archaeal form II/III Metha- were performed according to Satagopan et al. (2014)
nococcoides burtonii rbcL and bacterial form II R. with modifications to ensureanaerobiosis.All crude
rubrum cbbM (Falcone and Tabita, 1993) were used proteinlysates, reagentsandsupplieswereprepared
for comparison and served as positive controls. The inananaerobicchamberandcrimpedinsealedglass
sequencesofallconstructswereverifiedatthePlant- vials prior to conducting the assays in a hood using
Microbe Genomics Facility at The Ohio State gas-tight Hamilton syringes (Hamilton Robotics,
University. Reno, NV, USA). Assay reaction mixtures contained
50mM NaHCO3 (~2μCi of NaH 14CO3), 10mM MgCl2
and 0.8mM ribulose 1,5 bisphosphate (RuBP) in
Growth and culture conditions 50mMbicine-NaOHbuffer,pH8.0.RubisCOactivity
To test for in vitro activity, RubisCO proteins from isinitiatedbyaddition ofitsphysiologicalsubstrate,
pet11a constructs were expressed in Escherichia RuBP. Reactions were initiated with the addition of
coli strain Rosetta 2 (DE3) pLysS (Novagen), which crude protein lysate to its physiological substrate,
usesthepRAREplasmidtoaccommodateexpression RuBP,at30°Candincubatedfor5–30min.Negative
of rare codons such as those from candidate phyla control reactions consisted of no RuBP or no crude
and archaeal RubisCO sequences. Cells were grown protein lysate added. Carboxylase activities were
in 50ml lysogeny broth supplemented with determined following the equimolar conversion of
150μgml−1 ampicillin and 12.5μgml−1 chloram- 14C from NaHCO into acid-stable 3-phosphoglycerate
3
phenicol in 125-ml flasks. After reaching an optical (3-PGA), as measured via radioactive scintillation
density of 0.6–0.8 (at 600nm), RubisCO gene counting. Protein was determined via the Bradford
expression was induced by adding IPTG to a final method and specific activities (nmol CO fixed per
concentration of 1mM. After 5–16h of shaking at minper mg protein) were calculated cons2istent with
roomtemperature,cells were harvested, flash frozen prior publications (Satagopan et al., 2009).
andstoredat−80°Cpriortoanalysis.Toensurethat
candidatephyla RubisCOswere exposed tominimal
amounts of oxygen, cell lysis and assay conditions Results and discussion
were performed under strict anaerobic conditions.
Using an anaerobic chamber, cell pellets were Recovery of CPR genomes that contain RubisCO
resuspended in 50mM bicine-NaOH, 10mM MgCl2, We reconstructed CPR genomes from sequence data
at pH 8.0, supplemented with 2mM NaHCO3 and setsforsamplescollectedover93days(Brownetal.,
1mM dithiothreitol; crude protein lysates were 2015; Castelle et al., 2015). As previously reported,
obtained via bead beating with 0.1-mm beads for bacteria from CPR phyla WS6, OD1 and OP11 were
3–4-min intervals in tightly capped tubes. enriched (475% 16S rRNA relative abundance) on
To test for in vitro activity from complemented the 0.1-μm filter, and some members were shown to
autotrophic cultures, p80::PER, p80::R. rubrum and have ultra-small cells (median cell volume of
pRPS-MCS3::A. fulgidus RubisCO constructs were 0.009±0.002μm3) (Luef et al., 2015). Multiple
transformed into E. coli strain S17-1 (ATCC47055) genomes belonging to the CPR phylum PER were
and mobilized into R. capsulatus SB I/II− via recovered from all six 0.2-μm filter data sets,
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2706
suggesting that these cells may be larger in size than the three partial PER genomes analyzed from
other CPR bacteria (for example, OD1, WS6) samples collected from the same aquifer 5–7 days
enriched on the 0.1-μm filters. after acetate stimulation, two of which (ACD51,
Taxonomic identifications of the CPR organisms ACD65) encoded RubisCO genes (Wrighton et al.,
from which we recovered RubisCO genes are 2012). Phylogenetic analyses that include these and
provided in Supplementary Figures S1 and S2. The otherrecentlyreportedsequences(Rinkeetal.,2014;
emergent self-organizing map clustered genome Brown et al., 2015) support the identification of this
fragments based on their tetranucleotide sequence lineage as a separate phylum (Candidatus Peregrini-
composition (Supplementary Figure S3) and vali- bacteria). Based on sequence divergence, the five
dated 16 unique CPR genomes that ranged in Peregrinibacterial genomes reported here represent
estimated completion from near-complete (494%) multiple taxonomic classes in this newly designated
to complete (Supplementary Figure S4). Here we phylum.
report RubisCO sequences from five genomic bins
assigned to the PER phylum, two to the Parcubac-
teria (OD1), five to the Microgenomates (OP11) and CPR genomes contain novel and diverse RubisCO
six to the WS6 phylum (Table 1). We also report sequences
RubisCO from additional scaffolds assigned to these Fifty-one new RubisCO genes were identified from
CPR phyla. these CPR genomes. Some sequences, for example,
The five WS6 genomes reported here span multi- from the WS6-1 and OD1-1 genomes, were indepen-
pletaxonomicclasses(SupplementaryFigureS1and dently reconstructed from multiple data sets. Ulti-
Supplementary Table S2) (Brown et al., 2015). The mately, 18 representative unique sequences were
manually curated and closed WS6 genome is manually confirmed by read mapping to verify their
0.896Mbpinlengthandisthefirstcompletegenome accuracy (Figure 1). These sequences were most
reconstructed for a member of this lineage. The similar to previously reported CPR or archaeal
mapped read file confirming the assembly and the RubisCO sequences (and not bacterial RubiscCO
annotations can be accessed via ggKbase (see sequences)(SupplementaryTableS3)andthemajor-
Methods section). Given the level of genome com- ity harbor the key active-site residues (Figure 2).
pletion and the breadth of phylogenetic diversity Phylogenetic analyses revealed that the seven
sampled here (Konstantinidis and Rosselló-Móra, bacterial CPR RubisCO sequences reported here are
2015), we propose the name Candidatus Dojkabac- members of the previously defined archaeal inter-
teria for the WS6 phylum to honor the memory of mediateformII/IIIRubisCOs(Figure1)(Tabitaetal.,
Michael A Dojka, who first reported this lineage 2007, 2008). Recent metagenomic studies reported
basedon16SrRNAgenesequencedata(Dojkaetal., the form II/III RubisCOs from PER and SR1 unculti-
2000) vated bacterial lineages (Wrighton et al., 2012;
We recovered five distinct PER genomes that we Campbell et al., 2013; Kantor et al., 2013). Although
estimated to be 493% complete (Table 1; the overall sequence similarity is most closely
Supplementary Figure S4). This data set augments related to type II, protein sequence alignment
Table1 SummaryofCPRgenomesreportedinthisstudy
GenomeID IDggKbase Taxonomic Completeness(%) Number RelativeGC Bin No.of RubisCo
affiliation of content(%) length protein- form
scaffolds (Mbp) codinggenes
PER-1 PER_GWA2_38_35 Peregrinibacteria 95 5 38.2 1.11 1020 II/III
PER-2* PER_GWF2_38_29 Peregrinibacteria 93 23 38 1.23 1164 II/III
PER-3 PER_GWF2_39_17 Peregrinibacteria 100 15 38.8 1.31 1129 II/III
PER-4 PER_GWC2_39_14 Peregrinibacteria 98 21 38.5 1.32 1247 II/III
PER-5 PER_GWF2_43_17 Peregrinibacteria 100 16 43.1 1.20 1136 II/III
WS6-1 WS6_GWF2_39_15 Dojkabacteria 100(closed) 1 38.7 0.896 891 II/III
WS6-2 WS6_GWC1_33_20 Dojkabacteria 98 23 32.7 0.65 666 III-like
WS6-3 GWF1_WS6_37_7 Dojkabacteria 100 18 36.7 1.22 1153 III-like
WS6-4 WS6_GWE2_33_157 Dojkabacteria 100 29 32.8 0.61 628 III-like
WS6-5 WS6_GWE1_33_547 Dojkabacteria 95 34 32.5 0.61 649 III-like
WS6-6 WS6_GWF1_33_233 Dojkabacteria 98 23 32.7 0.61 640 III-like
OD1-1 OD1_GWE2_42_8 Parcubacteria 93 34 42.5 0.743 781 III-like
OD1-2 OD1_RIFOXA1_OD1_43_6 Parcubacteria 84 183 31.7 0.67 850 III-like
OP11-1 OP11_GWA2_42_18 Microgenomates 95 53 42.3 1.24 1324 III-like
OP11-2 GWA2_OP11_43_14 Microgenomates 100 24 43.2 1.65 1699 III-like
OP11-3 RBG_16_OP11_37_8 Microgenomates 95 84 37.5 1.31 1506 III-like
Abbreviation:CPR,candidatephylaradiation.Boldedtextincludesmanuallycuratedandconfirmednear-completeorcompletegenomes,while
asterisk(*)denotesRubisCOthatwasclonedandfunctionallyconfirmed.
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2707
Intermediate type II/III
Insertion of
48 amino acids PER2
WS6-1*
1RSOP11SR1 PEWR3SP6E-R1P1*EP*RE5R2/4 Bacterial type II
Type IA/IB
00..11
BBaacctteerriiaall
CCaannddiiddaattee PPhhyyllaa
SSRR11
OOPP1111
WWSS66
100
OODD11
Type IC/ID 90
PPEERR
100
100
To type IV
94 RuBisCO-like
100 64 91
100 99 60 100
Firmicutes 100
100
OD1 SCG
MethanoMseatrhciannaolceAosccicdaulliepsrofundum AHrcahloabeaacl tTeyripaele sIII
WS6-3
Archaeal Type III
24In asmeritnioon a ocfid s OODP11-11-1/2/3OD1-1 WS6-2
WS6-2/4/5/6
Bacterial Type III-like
Figure1 MaximumlikelihoodphylogenetictreeconstructedfortheRubisCOlargesubunit.Keybootstrapvalues450areshownbased
on100resamplings.ProteinmodelsofthePERandDojkabacteria(WS6)typeII/IIIRubisCOsandtheDojkabacteriaandParcubacterium
(OD1)typeIII-likeRubisCOsareincluded.Theasterisk(*)referstogenomesthatarecompleteornearlycompleteandhand-curated.The
additional inserts of the form II/III RubisCO from the closed Dojkabacteria genome (48 residues) or of the form III-like from the
Parcubacteriumgenome(24residues)arehighlightedinredontheproteinmodels(ExpandedviewshowninSupplementaryFigureS6).
confirmed the presence of form III-specific residues genomes to date lack the insertion. The Microgeno-
(Supplementary Figure S5), validating the form II/III mates and two of the Dojkabacteria RubisCOs
phylogeneticplacementofthesebacterialsequences. reported here have the 29 amino-acid insertion
Our results expand the bacterial members of this found in the SR1 and archaeal II/III form, while the
form by four additional PER sequences, add the first closed WS6 genome (WS6-1) encodes a RubisCO
three Dojkabacteria sequences and add a single-cell with a novel 48 amino-acid insertion (Figure 1,
unpublished Microgenomates (OP11) sequence. Ori- Supplementary Figures S5 and S6). Both the 29 and
ginally, two form II/III RubisCO subgroups were 48amino-acidinsertionsequencesarelocatedinthe
described based on the absence (PER) or the same position as previously identified insertions.
presence (SR1 and archaea) of a 29 amino-acid Similar to the methanogen sequences, the insertion
N-terminal insert sequence (Wrighton et al., 2012; is sandwiched between catalytic sites of the enzyme
Campbelletal.,2013;SupplementaryFiguresS5and (Supplementary Figure S5), yet impact of these
S6). Here our additional sampling of the PER insertions on enzyme biochemical properties is
RubisCO confirms this earlier discovery, as all PER currently unknown.
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2708
Figure 2 Conservation of RubisCO active-site residues. Alignment of active-site residues from representative RubisCO and RLP
sequences as noted previouslyby Tabitaet al., 2007,alongwith equivalentresiduesfrom the new sequencesdescribedin this study.
Residues are noted in single-letter IUPAC code. Coloring scheme is based on the identities of residues relative to the Synechococcus
elongatus PCC 6301 reference sequence. Amino-acid numbering according to Synechococcus elongatus PCC 6301 reference sequence
have been indicated. Green shading refers to conserved residues, yellow indicates semi-conserved residues and red represents non-
conservedresidues.Catalytic(C)andRuBP-binding(R)residuesareindicatedonthetop;Xdenotesmissingresidue.TheRubisCOmotif
referstoastretchofconservedresiduesthatareproximalintheprimarystructureandisidentifiedbyconsensusresidueidentitiesshaded
gray.ResiduesK,DandEfromthismotifparticipateincatalysis.
InadditiontotheformII/IIIsequences,nineofour insertions between alpha-helix 6 and beta-sheet 7
newly recovered bacterial sequences form a distinct (Figure 1, Supplementary Figures S5 and S6). The
and strongly supported monophyletic group that WS6 genome containing divergent form III also
branchesdeeplyfrom,yetismostsimilarto,theform contains the eight amino-acid insertion. To further
III RubisCO sequences found in archaea. This confirm analyses indicating that these insertions
currentCPRcladeincludesRubisCOsequencesfrom were not due to sequencing or assembly error, we
Dojkabacteria (3), Parcubacterium (1) and Microge- designedspecificPCRprimersforthegeneencoding
nomates (5). Although the extent of sequence for the large RubisCO subunit and confirmed the
divergence between these bacterial sequences and insertions via cloning and sequencing. Prior
the archaeal form III sequences is considerable, sequence analyses demonstrated that this region
additional taxon sampling is needed to resolve this was involved in interactions with the neighboring
cladeascompletelydistinctRubisCOform(Figure1, subunit (Satagopan et al., 2014); however, these
Supplementary Figure S7). In our analyses, we have insertions have not been previously identified in
identified other bacterial genomes that contain other RubisCO sequences.
homologs to archaeal form III sequences in public
databases (for example, from a partial single-cell
Parcubacteria genome and in two Firmicutes draft CPR RubisCO is inferred to support a heterotrophic
genomes; Supplementary Figure S7). Notably, these nucleoside pathway
bacterialformIIIarenotmonophyleticwiththeCPR Previously studied archaeal form III (Finn and
RubisCOs described here but clade with the canoni- Tabita, 2004; Sato et al., 2007; Estelmann et al.,
cal archaeal form III. 2011) and form II/III (Tabita et al., 2008; Alonso
The parcubacterial sequence from the OD1-1 et al., 2009) RubisCOs are not associated with the
genome contains two, an 8 and 24 amino acid, CBB pathway but instead function in a nucleoside-
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2709
salvaging pathway. Consistent with the archaeal RuBPbyanenzymeannotatedtofunctioninthiamine
genomes that contain RubisCO, all bacterial CPR monophosphate formation (Figure 3, unique pathway
genomes containing RubisCOs sampled to date lack denoted in yellow; Finn and Tabita, 2004) (that is,
the gene encoding phosphoribulokinase, a key gene PRPP→ribose 1,5 bisphosphate→RuBP). Given that
oftheCBBpathway(Wrightonetal.,2012;Campbell onlyonePeregrinibacteriumgenomecontainsadistant
et al., 2013; Kantor et al., 2013). However, we note homologofthisRuBP-formingenzyme,andallofthese
thesegenomescontainmanygenesthataretoonovel CPR were recovered from an ~14°C aquifer, this
to be identified based on sequence similarity and pathwayseemslesslikelyinmostCPRbacteriawhose
thus we cannot eliminate the possibility that they genomes encode RubisCO. However, we note many
contain new enzymes or divergent phosphoribuloki- PER genomes encode an enzyme that can convert
nasehomologs(Ashida etal.,2008).Consistentwith PRPP to AMP. This enzyme, adenine phosphoribosyl-
our hypothesis that these RubisCO function in transferase, may represent an alternative mechanism
archaeal nucleoside pathway, outside other CPR by which PER genomes can use PRPP via the AMP
sequences, these RubisCO proteins have the closest pathway and not the pathway previously reported in
homologs to similar archaeal and not bacterial thermophilic archaea (that is, PRPP→AMP→ribose
RubisCOs (Supplementary Table S3). Thus we 1,5 bisphosphate →RuBP). Thus we cannot rule out
suggest RubisCOs studied here also function in a that PRPP may also be used as a source for RuBP in
nucleoside/nucleotide metabolism pathway analo- some members of the CPR via a T. kodakarensis-like
gous to archaea. pathway (Figure 3).
Rather than using phosphoribulokinase, there are Very recently, Aono et al. (2015) identified
two proposed pathways to generate RuBP, the alternative enzymes involved in the nucleoside
substrate for RubisCO in Archaea. In the first path- metabolism pathway in T. kodakarensis, linking
way, identified in Thermococcus kodakarensis, nucleoside moieties to central carbon metabolism.
nucleosides are converted into 3-PGA. The first step Instead of AMP phosphorylase (step 1 T. kodakar-
in this short pathway dephosphorylates the nucleo- ensis pathway), the ribose moieties of adenosine,
tide (AMP, CMP, UMP) to release the base (for guanosine and uridine are converted to ribose-1-
example,adenine),yieldingribose-1,5-bisphosphate, bisphosphate (R1P) by three nucleoside phosphor-
which an isomerase converts to ribulose-1,5-bispho- ylases. R1P is converted to ribose-1,5-bisphosphate
sphate (RuBP) (that is, AMP→ribose 1,5 bispho- (R15P) via a nucleoside kinase (ADP-dependent
sphate→RuBP). From RuBP and CO , the type III or ribose-1-phosphate kinase). R15P is likely directed
2
II/III archaeal RubisCO generate two molecules of 3- toglycolysisviatheR15PisomeraseandRubisCO.In
PGA, a central intermediate of core carbon metabo- this metabolic context, using phosphate and ADP,
lism (Aono et al., 2012). the ribose moieties of adenosine, guanosine and
AllCPRgenomeswithRubisCOreportedhereand uridineareconvertedtoR15PviaR1Pbynucleoside
studied previously possesses the two archaeal AMP phosphorylases and ADP-R1P kinase (Aono et al.,
pathway homologs (Figure 4), with the best hits to 2015) (Figure 3, blue dotted line). Finally, R15P can
the AMP phosphorylase and the isomerase outside be directed to glycolysis via the functions of R15P
the CPR are largely from archaeal genomes isomerase and RubisCO (Aono et al., 2015). We
(Supplementary Table S3). For both proteins in this examined the CPR genomes for this alternative
pathway,themostsimilarsequencesoutsidetheCPR T. kodakarensis pathway and found some (albeit
are to homologs in archaeal genomes, often times to weak) support for this pathway in Microgenomates
methanogens (PER) or to uncultivated archaeal and most PER genomes. Thus it cannot be ruled out
lineages such as the DPANN (Castelle et al., 2015). thatsomeCPRbacteriacoulduseanalternativeroute
We used an amino-acid alignment and protein through which R15P is generated from the phos-
modeling to confirm that the CPR isomerase has all phorolysis of nucleosides (Figure 3).
residues for catalytic activity and is specific for CPRorganismsthatcontainRubisCOareinferredto
ribose1,5bisphophateandnotanothersubstrate,for be obligate fermenters (Supplementary Figure S10),
example, 5-methylthioribose-1-phosphate isomerase consistent with prior metabolic analyses (Wrighton
(Supplementary Figures S8 and S9). Based on the et al., 2012; Campbell et al., 2013; Kantor et al.,
genomic evidence to date, we propose that the CPR 2013; Wrighton et al., 2014). All of the genomes
bacteria likely use a RubisCO pathway analogous to lack a complete tricarboxylic acid cycle, NADH
that of T. kodakarensis (Figure 3, denoted in green). dehydrogenase (complex I) and most other com-
IncontrasttoThermococcuspathway,otherthermo- plexes from the oxidative electron transport phos-
philic archaeal (Methanocaldococcus jannaschii, phorylation chain (for example, complex II,
Methanosarcina acetivorans and Archaeoglobus litho- complex III, complex IV and quinones). All CPR
trophicus) use a second pathway. In this pathway, genomes included in this study, besides the
ribose-phosphate pyrophosphokinase (PRPP) is sug- Dojkabacteria (WS6), have complete or near-
gested to be desphophorylated to ribose 1,5 bispho- complete glycolysis and/or pentose phosphate
sphateviaanabioticreactionthatisfacilitatedbyhigh pathway(s) and the capacity to produce acetate,
temperatures. In archaea, the non-enzymatically pro- lactate and/or hydrogen as byproducts of fermenta-
duced ribose 1,5-bisphosphate is then converted to tion.Notably,thePER,incontrasttobacteriaofthe
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2710
Glucose-6P Glucono-1,5-lactone-6P Gluconate-6P
G6PD PGLS
6 1.1.1.49 6 3.1.1.31 6
GPI 5.3.1.9 hxlB arabino-3-Hexulose-6P hxlA HCHO CO2
5.3.1.27 6 4.1.2.43
Xylulose-5P Extracellular
Fructose-6P tkt RPE RNA
6 2.2.1.1 5 5.1.3.1 5 Ribulose-5P
?
Erythrose Sedoheptulose rpiA 5.3.1.6
AMP PRPP
-4P -1,7P2
FBP 3.1.3.11 4 7 5 2.7.6.1 5 A2.P4.R2T.7 AMP
Ribose-5P PRPP
Fr1u,c6tPos2e- 6 tal 2.2.1.2 + ATP synthetase Adenine
2.2.1.1 tkt
7
Sedoheptulose-7P 2.4.2.57 AMPase
ALDO 4.1.2.13 DAK DAS
Glycerone-P
2.7.1.29 2.2.1.3
3 3 Pi
3 5.T3.I1M.1 GlyceroneHCHO 2.7.1.19 ?
Adenine
Phosphoribulokinase
Glyceraldehyde-3P
GAPDH 1.2.1.12
3 Glycerate-1,3P2
PGK 2.7.2.3 CO
2
RuBisCO Isomerase
3 4.1.1.39 5 5.3.1.29 5
Glycerate-3P Ribulose-1,5P2 Ribose-1,5P2
gpmI 5.4.2.12
ThiC
3 Glycerate-2P NAD+(?)
ENO 4.2.1.11
3 Phosphoenolpyruvate
PK 2.7.1.40
PFOR Acetyl-CoA PTA Acetyl-P ackA Acetate
3 1.7.1.3 3 2.3.1.8 3 2.7.2.1 3
Pyruvate
ATP
6.2.1.13
ACS
ATP
Figure3 ProposedmetabolicroleofRubisCO,withgenesidentifiedinthePER-1(blueline)andWS6-1genomes(purpleline).Black
lines indicate genes not identified in either organism. Thick blue arrows represent genes identified by metatranscriptomics
(SupplementaryTableS4).YellowlinesdenotedtheproposedPRPPpathwayindicatedinthetext.Abbreviationsnotindicatedinthe
text: G6PD, glucose-6-phosphate dehydrogenase; PGLS, 6-phosphogluconolactonase; hxlB, 6-phospho-3-hexuloisomerase; hxlA, 3-
hexulose-6-phosphatesynthasePgm,phosphoglucomutase;GPI,glucose-6-phosphateisomerase;FBP,fructose-1,6-bisphosphatase;Aldo,
fructose-bisphosphate aldolase; GAPDH, glyceraldehyde 3-phosphate dehydrogenase, Pgk, phosphoglycerate kinase; TIM, triosepho-
sphate isomerase; GpmI, 1,3-bisphosphoglycerate-independent phosphoglycerate mutase; Eno, enolase; PK, pyruvate kinase; PFOR,
pyruvate/2-oxoacid-ferredoxin oxidoreductase; ACS, Acetyl-CoA synthetase; PTA, phosphotransacetylase; ackA, acetate kinase; RpiA,
ribose 5-phosphate isomerase A; RPE, ribulose-phosphate 3-epimerase; Tkt, transkelotase; Tal, transaldolase; APRT, adenine
phosphoribosyltransferase; PRPP synthetase, ribose-phosphate pyrophosphokinase; DAK, glycerone kinase; DAS, dihydroxyacetone
synthase;AMPase,AMPphosphorylase;isomerase,Ribose1,5-bisphosphateisomerase;NMP,nucleoside5′-monophosphate.
other CPR genomes with RubisCO described here, for the other CPR genomes. Enzymes that were
seemtohavethecapacitytosynthesizenucleotides identified include those required for the last steps
andsomeaminoacids(SupplementaryFigureS11). of glycolysis from PGA to pyruvate and ultimately
The metabolic information gleaned from the first fermentationtoacetateandATP.Notably,PGAisthe
complete WS6 genome issparsecompared with that product of RuBP conversion by RubisCO (Figure 3).
TheISMEJournal
RubicCOinuncultivatedbacteriaoftheCPR
KCWrightonetal
2711
We failed to identify genes for upper glycolysis, would minimize the energetic cost of upper glyco-
gluconeogenesis and the pentose phosphate path- lysis, maximizing energy yields for these organisms
way in the WS6 genomes (Supplementary such as WS6 with tiny genomes.
FigureS10). Gene andproteinsequencesdiscussed
inthismanuscriptcanbeaccessedthroughlinksin
Supplementary Figures S10 and S11. This finding CPR RubisCO transcripts detected in the environment
is similar to reports of bacteria of the SR1 phylum We examined the metatranscriptomic sequence
(placement within the CPR is uncertain) that also data set collected from the 0.2-μm filter samples
encode form II/III RubisCO and lack these path- for reads that mapped uniquely to PER-1 genome
ways (Campbell et al., 2013; Kantor et al., 2013) (Supplementary Table S4), given that these organ-
(Supplementary Figure S10). Complete and closed isms were enriched on this filter (there was insuffi-
SR1 and WS6 genomes also lack pathways for cient biomass after metagenomics to investigate the
biosynthesis of nucleotides (purines and pyrimi- 0.1-μm filters). In the sample collected prior to
dines) and amino acids (Kantor et al., 2013). acetate amendment (sample A), we confirmed the
Consistent with reports for SR1 and some small expression of RubisCO (Scaffold 3_gene102) and the
CPRgenomes(Kantoretal.,2013;Gongetal.,2014; surrounding gene neighborhood (3_105, 3_107,
Brown et al., 2015; He et al., 2015; Luef et al., 3_109) (Figure 4a). Notably, besides RubisCO, all
2015), we suggest that WS6 bacteria reported here of these transcribed genes were annotated as
have an obligatory symbiotic or parasitic lifestyle hypothetical. Also in sample A, we detected tran-
and are not free-living. scripts for step 1 of the AMP pathway (AMP
It is possible that WS6 and other CPR that encode phosphorylase (Scaffold 5_gene7)) and energy genera-
RubisCO may be reliant on other members of the tion via acetate production (phosphotransacetylase
community for external ribose, which is utilized as (Scaffold 5_gene5)), as well as a poorly annotated
carbon and energy sources. A bacterial or archaeal phosphoesterase (Scaffold 5_gene4) (Figure 4b). The
cell is on average 20% by weight RNA, and RNA is expression of RubisCO, a key gene in the AMP
40%byweightribose,meaningthatacellisroughly pathway, and genes for acetate production from the
8%pureribose.Arecentpaperusedthisinformation same genome prior to stimulation suggests a role for
to suggest that ribose was likely an abundant sugar the RubisCO and likely the AMP pathway in bacteria
availableonearlyearthforfermentationandthatthe under natural aquifer conditions.
archaeal type III RubisCO pathway of nucleoside
monophosphate conversion to 3-PGA might be a
relicofancientheterotrophy(Schönheitetal.,2015). The PER RubisCO can support autotrophic
In a similar manner, we suggest that in WS6 the CO -dependent growth
2
ribosemoietycanbefunneledintoRuBPviathetwo We synthesized a recombinant protein in E. coli
gene AMP pathway and converted to PGA via ofthePER-1formII/IIIRubisCOthatwastranscribed
RubisCO to ultimately yield ATP and acetate via in situ. We demonstrated CO fixation activity
2
substrate-level phosphorylation. This use of ribose from the crude extract using RuBP as a substrate
Pyrimidine metabolism
Purine metabolism
Hypothetical protein
Acetate production
Cell wall-related protein
Peptidase
Figure4 GeneneighborhoodsneartheformII/IIIRubisCO(a)aswellasAMPphosphorylaseandR15Pisomerase(b)fromthePER-1
genome.(a)RubisCO(red)clusterswithgenesforpyridmidineandpurinemetabolism.(b)Genesforproducingacetate(blue)areneartwo
homologsfromtheAMPpathway(darkgreen).GeneclusterswereorganizedandvisualizedusingGeneiousv7.0.6.
TheISMEJournal
Description:In addition, we significantly expand the Peregrinibacteria (PER) II/III . logies/Invitrogen) using the default parameters Cell wall-related protein.