Table Of ContentDownloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Article
The Bacterial Replicative Helicase DnaB Evolved
from a RecA Duplication
Detlef D. Leipe,1 L. Aravind,2,3 Nick V. Grishin,1,4 and Eugene V. Koonin1,5
1NationalCenterforBiotechnologyInformation(NCBI),NationalLibraryofMedicine,NationalInstitutesofHealth,Bethesda
Maryland20894USA;2DepartmentofBiology,TexasA&MUniversity,CollegeStation,Texas70843USA
The RecA/Rad51/DCM1 family of ATP-dependent recombinases plays a crucial role in genetic recombination
and double-stranded DNA break repair in Archaea, Bacteria, and Eukaryota. DnaB is the replication fork
helicase in all Bacteria. We show here that DnaB shares significant sequence similarity with RecA and
Rad51/DMC1 and two other related families of ATPases, Sms and KaiC. The conserved region spans the entire
ATP- and DNA-binding domain that consists of about 250 amino acid residues and includes 7 distinct motifs.
Comparison with the three-dimensional structure of Escherichia coli RecA and phage T7 DnaB (gp4) reveals that
the area of sequence conservation includes the central parallel b-sheet and most of the connecting helices and
loops as well as a smaller domain that consists of a amino-terminal helix and a carboxy-terminal b-meander.
Additionally, we show that animals, plants, and the malarial Plasmodium but not Saccharomyces cerevisiae encode a
previously undetected DnaB homolog that might function in the mitochondria. The DnaB homolog from
Arabidopsis also contains a DnaG–primase domain and the DnaB homolog from the nematode seems to contain an
inactivated version of the primase. This domain organization is reminiscent of bacteriophage primases–helicases
and suggests that DnaB might have been horizontally introduced into the nuclear eukaryotic genome via a
phage vector. We hypothesize that DnaB originated from a duplication of a RecA-like ancestor after the
divergence of the bacteria from Archaea and eukaryotes, which indicates that the replication fork helicases in
BacteriaandArchaea/Eukaryotahaveevolvedindependently.
Geneticrecombinationisanessentialprocessforboth showameioticarrestphenotype,anditprobablyfunc-
recombinational repair and sexual reproduction. In tionsintheformationofsynaptonemalcomplexesand
Bacteria,thecentralroleinrecombinationisplayedby also in double-strand break repair (Bishop et al. 1992;
the RecA recombinase enzyme (Radding 1989; Kowal- Dresseretal.1997;Yoshidaetal.1998).Thus,thereis
czykowskiandEggleston1994;Seitzetal.1998).RecA functional overlap between Rad51 and DMC1 (Shino-
is a DNA-dependent ATPase that promotes homolo- hara et al. 1997) and Caenorhabditis elegans seems to
gous pairing and strand exchange between different have only a single Rad51/DMC1 homolog (Takanami
double-stranded (ds) DNA molecules and is therefore et al. 1998). A Rad51/DMC1 homolog (termed RadA)
necessaryforhomologousrecombinationandDNAre- thatcatalyzesDNApairingandstrandexchange(Seitz
pair(Kowalczykowskietal.1994).Thebiochemicalac- etal.1998)isalsofoundintheArchaea(Sandleretal.
tivities of RecA include the ability to form regular he- 1996).
lical filaments, bind single-stranded (ss) and dsDNA, The RecA/RadA/DMC1 recombinases are closely
and bind and hydrolyze nucleoside triphosphates related to three other groups of ATPases, namely bac-
(Kowalczykowski et al. 1994). In addition to its direct terial Sms (also called RadA), bacterial DnaB, and ar-
roleinrecombination,RecAfunctionsasacofactorin chaealandbacterialKaiC.TheSmsproteinisapoorly
thecleavagereactionforLexA,therepressoroftheSOS characterized bacterial homolog of RecA in which the
regulon (Little and Mount 1982; Witkin 1991). There RecA ATPase domain is fused to a Zn ribbon and a
aretwotypesofRecA-likeproteinsinmanyeukaryotes, predicted serine protease domain (Koonin et al. 1996;
namely Rad51 and DMC1/Lim15. Rad51 is expressed Aravind et al. 1999) (hereafter we use the designation
in both meiotic and mitotic cells and mainly partici- SmstoavoidconfusionwiththearchaealRadA).Esch-
pates in recombinational repair of double-strand erichiacolismsmutantsshowincreasedsensitivitytoX
breaks (Shinohara et al. 1992; Doutriaux et al. 1998). rays,UVradiation,andmethylmethanesulfonate,sug-
DMC1 is expressed in meiotic cells, its null mutants gestingaroleinrepairfortheSmsprotein(Neuwaldet
al.1992;SongandSargentini1996).
3Present address: National Center for Biotechnology Information, National The cyanobacterial KaiABC gene cluster consti-
LibraryofMedicine,NationalInstitutesofHealth,Bethesda,Maryland20894
tutes the circadian clock in the cyanobacterium Sy-
USA.
4Presentaddress:DepartmentofBiochemistry,UniversityofTexasSouthwest- nechococcus(Ishiuraetal.1998;Iwasakietal.1999).The
ernMedicalCenter,Dallas,Texas75235USA. KaiCproteingeneratesacircadianoscillationbynega-
5Correspondingauthor.
[email protected];FAX(301)435-7794. tivefeedbackcontrolonitsownexpression(Ishiuraet
10:5–16 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/99 $5.00; www.genome.org Genome Research 5
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Leipe et al.
al.1998).TheSynechococcusKaiCproteiniscomposed yeast(Game1993),XRCC2(Tambinietal.1997),R5H2
of two RecA-like domains joined head to tail. Highly and R5H3 (Cartwright et al. 1998), and TRAD (Kawa-
conserved homologs of KaiC are found in the cyano- bata and Sacki 1998) in mammals, and several other
bacterium Synechocystis, the bacterium Thermotoga, distinct RecA homologs found in Archaea and some
andinallArchaeabutabsentfromotherbacteriaand bacteria (Aravind et al. 1999). Some of these orphan
eukaryotes(Makarovaetal.1999). RecA homologs appear to contain an inactivated
TheDnaBhelicaseisacrucialproteininbacterial ATPase domain (Aravind et al. 1999). Additional do-
DNAreplication.ItunwindstheDNAduplexaheadof mains associated with the RecA core include a modi-
the replication fork and is also responsible for attract- fied amino-terminal helix–hairpin–helix (HhH) do-
ingtheDnaGprimasetothereplicationfork(Touguet maininthearchaeoeukaryoticRadA/DMC1,aamino-
al.1994;Luetal.1996).Theactiveformoftheprotein terminal zinc finger and a carboxy-terminal Lon-type
is a hexamer of identical 52.3-kD subunits that can protease domain in Sms, and a GTPase in one of the
form rings with threefold (C3) and sixfold (C6) sym- archaealRecAhomologs(Aravindetal.1999).
metry (Yu et al. 1996) and it has been hypothesized Here, using a combination of sequence database
thattheamino-terminalATPasedomainsoftwoadja- searches, sequence alignments, phylogenetic analysis,
cent protomers dimerizes to make the C6–C3 conver- and structural comparison, we show that (1) DnaB,
sion (Fass et al. 1999). The crystal structure of the he- RecA,DMC1/RadA,Sms,andKaiCsharesignificantse-
licase domain of phage T7 helicase–primase (gp4) has quence similarity along a region of 250 amino acids
recently been solved (Sawaya et al. 1999) and it has that includes both the ATP-binding domain and the
been found that the structure of the T7 helicase do- DNA-binding site; (2) DnaB likely evolved from RecA
mainanditsinteractionswithneighboringsubunitsin byageneduplicationeventattheonsetoftheevolu-
the crystal resemble those of the RecA and F ATPase tion of the Bacteria; (3) RecA and DnaB are likely to
1
(Sawayaetal.1999).InadditiontotheATPasedomain, performtheirfunctionbyasimilarmechanismofcon-
E. coli DnaB comprises a globular amino-terminal do- formational change; (4) eukaryotes encode diverged
main(proteolyticfragmentIII)thatisessentialforin- homologs of DnaB, some of which also contain a
teractionwithotherproteinsinvolvedinDNAreplica- DnaG-type primase domain; these genes might have
tion like DnaA, DnaC, and the DnaG primase (Na- beenintroducedintotheeukaryoticgenomebyahori-
kayama et al. 1984; Biswas et al. 1994; Sutton et al. zontal transfer event involving a bacteriophage. We
1998).Thedomainconsistsofsixahelices(Weigeltet hypothesize that the common ancestor of the RecA/
al. 1998; Fass et al. 1999; Weigelt et al. 1999) that are DnaB superfamily functioned as a recombinase in the
attachedtothecarboxy-terminalATPasedomainbya lastcommonancestor(LCA)ofallextantcellsandthat
flexiblehinge(Milesetal.1997). aRecAhomolog(DnaB)wasrecruitedforthehelicase
In addition to RecA, DMC1/Rad51/RadA, DnaB, function at the replication fork once DNA replication
Sms,andKaiC,thereisalargenumberofproteinswith evolved in bacteria. This interpretation lends further
more limited phylogenetic distribution that contain support to the hypothesis that the DNA replication
the core RecA ATPase domain. These include, among machinery evolved independently in bacteria and ar-
others,Rad51-interactingproteinsRad55andRad57in chaea/eukaryotes(Leipeetal.1999).
Figure 1 (See pages 7–9.) Multiple alignment of the core domain of the RecA/DnaB superfamily of ATPases. From top to bottom
(separated by horizontal lines) the alignment contains sequences from bacterial and chloroplast DnaB, DnaB proteins and primase–
helicaseproteinsfrombacteriophagesandeukaryotes,bacterialSmsproteins,KaiCfromArchaeaandBacteria,RecArecombinasefrom
BacteriaandphageT4,andRadAandRad51/DMC1recombinasesfromArchaeaandEukaryota.The80%consensusfortheseproteins
isshownbelowthealignedsequences.Numbersindicatethedistancetotheamino-terminalmethionineandthecarboxylterminusof
eachproteinandresiduesomittedwithinthealignment.(&)Thepositionofinteinsthathavenotbeenincludedinthealignment.The
secondary structure elements derived from the X-ray structures of phage T7 gp4 and E. coli RecA are shown above the respective
sequence.Helicesarerepresentedascylinders,strandsasarrows,andtheunorderedormobileloops1and2aslines.Keyresiduesthat
arediscussedinthetextaremarkedbyarrowheads;thenumbersidentifythepositionoftheresidueingp4andRecAaccordingtothe
originalpublications(Storyetal.1993;Sawayaetal.1999).Highlyconservedresiduesarecolorcodedandindicatedintheconsensus
lineforthefollowinggroups.(Purple)Negativelycharged(D,E);(red)positivelycharged(H,K,R),charged(c=D,E,H,K,R);(green)tiny
(u=G,A,S); (yellow) hydrophobic (h=A,C,F,I,L,M,V,W,Y) or aliphatic (l=I,L,V); (pale yellow) alcohol (o=S, T, Y); (light blue) polar
(p=D,E,H,K,N,Q,R,S,T),(reddish-brown)small(s=A,C,D,G,N,P,S,T,V);(gray)big(b=notsmall).Alsocoloredareresiduesconserved
onlywithintheDnaBfamily.Whereapplicable,sourceorganismsareidentifiedbyfour-letterabbreviations.(Aepe)Aeropyrumpernix;
(Aqae)A.aeolicus;(Arfu)Archaeoglobusfulgidus;(Arth)A.thaliana;(Basu)Bacillussubtilis;(Bobu)Borreliaburgdorferi;(T7)bacteriophage
T7;(T4)bacteriophageT4;(Cael)C.elegans;(CDnaB_Odsi)Odontellasinensischloroplast;(CDnaB_Popu)Porphyrapurpureachloroplast;
(Chtr)Chlamydiatrachomatis;(Ecol)E.coli;(Glma)Glycinemax;(Hain)Haemophilusinfluenzae;(Hepy)H.pylori;(Hosa)Homosapiens;
(Lema) Leishmania major; (Meja) Methanococcus jannaschii; (Meth) Methanobacterium thermoautotrophicum; (Mumu) Mus musculus;
(Myge) Mycoplasma genitalium; (Mytu)Mycobacterium tuberculosis; (Plch) P. chabaudi; (Rhma) Rhodothermus marinus; (Sace) Saccharo-
mycescerevisiae;(SPP1)BacillussubtilisbacteriophageSPP1;(Suso)Sulfolobussolfataricus;(Sy68)SynechocystisPCC6803;(Teth)Tetra-
hymenathermophila;(Thma)T.maritima;(Trpa)Treponemapallidum.
6 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
DnaB Evolved from a RecA Duplication
d.)
n
e
g
e
l
or
f
e
g
a
p
g
n
di
e
c
e
pr
e
e
S
(
1
e
r
u
g
Fi
Genome Research 7
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Leipe et al.
d)
e
u
n
nti
o
C
(
1
e
r
u
g
Fi
8 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
DnaB Evolved from a RecA Duplication
d)
e
u
n
nti
o
C
(
1
e
r
u
g
Fi
Genome Research 9
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Leipe et al.
RESULTS AND DISCUSSION
The Core ATPase Domains of RecA and
DnaB Are SpecificallyRelated
BLAST searches seeded with the E. coli DnaB
sequence retrieve the replicative helicases
fromawiderangeofBacteriaandseveralbac-
teriophages with highly significant E values
(<10140) and the helicase–primase proteins
from bacteriophages T3/T7 and T4 with less
significantEvalues(between1015and1013).
ThefirstiterationofthePSI-BLASTsearchun-
expectedlyretrieved,withhighlysignificantE
values,anumberofmembersoftheRecAsu-
perfamily, namely bacterial Sms proteins and
archaealandeukaryoticRadA/Rad51proteins.
Forexample,thesequenceoftheSmsprotein
from the bacterium Aquifex aeolicus was de-
tected with an E value of 1019 and the se-
quence of the murine Trad protein with an E
valueof721019.Inaddition,previouslyun-
Figure2 AmolscriptdiagramofE.coliRecAstructure.Areaswithsequence
detected eukaryotic homologs of DnaB from conservationbetweenDnaBandRecA/DMC1/RadAarehighlighted,thenon-
C. elegans, Arabidopsis thaliana, and Plasmo- conservedcarboxy-andamino-terminaldomainareshowninlightgray.The
central parallel µ-sheet is blue and the elements that are involved in coordi-
diumchabaudiwereretrievedwithEvaluesbe-
natingloop1(betweenstrand4andhelixF)andloop2(betweenstrand5and
tween 1017 and 1015; a human homolog of helix G) are green. Areas within the core domain that show no obvious se-
these proteins was detected among EST prod- quencesimilaritybetweenRecAandDnaB(helixD,strand3andhelixE)are
ucts by searching the database of expressed showninlightblue.ThesubdomaincomposedofhelixBandstrands6–8is
showninyellow.ADPdiffusedintothecrystal(StoryandStitz1992)isshown
sequencetags(dbEST)database(seediscussion
in ball-and-stick representation. Conserved amino acid residues that are dis-
below). Subsequent search iterations retrieve cussedinthetextandindicatedinthealignment(Fig.1)areshowninball-
the entire RecA family. Conversely, searches and-stick representation: Lys-72 (in the P-loop), Glu-96, Asp-144 (Walker B),
Gln-194,Arg-227,Lys-248,Lys-250,andTyr-264areatornearthecarboxyl
seeded with E. coli RecA retrieve members of
terminusofstrands2,4,5,6,and7,respectively.Aminoacidcoordinatesare
the DnaB family starting with an E value of
fromPDBfile2REB,locationofADPisfromPDBfile1REA.Theorientationofthe
0.001 for Helicobacter pylori DnaB in the first monomer, labels of strands, helices, loops, and residue enumeration are in
PSI-BLAST iteration, with all the other mem- accordancewiththeoriginalpublications(Storyetal.1992;Storyetal.1993).
bers of the DnaB family retrieved in subse-
quent iterations. In all of these searches, RecA family 1 and 2). As detailed above, we found that the se-
members and DnaB family members, respectively, quence of this 250-amino-acid central domain is spe-
were consistently retrieved from the database before cificallyconservedbetweenDnaB,RecA,DMC1/RadA,
any other ATPases. This suggests that within the class KaiC,andSmsproteinfamilies.Theimportanceofthis
ofP-loopATPases,thereisaspecificstructuraland,by core for RecA function is underscored by the fact that
inference, evolutionary, relationship between the Pk-REC,atruncated,210-amino-acidDMC1/RadAho-
RecA, DMC1/RadA, Sms, KaiC, and DnaB families; molog from Pyrococcus, which consists of the core do-
hereafter, we refer to them collectively as the RecA/ main alone, can complement UV-sensitive RecA mu-
DnaBsuperfamily. tantsinE.coli(Rashidetal.1996).
A multiple sequence alignment of the RecA and
DnaB sequences was constructed on the basis of the
Sequence and Structure Conservation in the PSI-BLAST output and refined manually using struc-
RecA/DnaBSuperfamily tural information on RecA and DnaB (Fig. 1). The re-
The structure of the E. coli RecA protein consists of a gion of sequence conservation between RecA, RadA/
majorcentraldomainflankedbytwosmallerdomains DMC1, Sms, KaiC, and DnaB extends for ~ 250 amino
attheaminoandcarboxytermini(Storyetal.1992;see acids and includes the P-loop and the Mg2+-binding
also Fig. 2, below). The central domain can be subdi- site (Walker A and B motifs, respectively), which are
vided into a large subdomain encompassing strands involvedinNTPbindingandhydrolysis.Althoughthe
1–5andtheconnectinghelicesandloopsandasmall WalkerAmotifshowsthetypicalG..GKTpatterncon-
subdomainthatrepresentstwononcontiguousregions servedinavastvarietyofATPaseandGTPases(Saraste
ofthesequenceincludinghelixBandstrands6–8(Figs. et al. 1990), it is noteworthy that the second carbox-
10 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
DnaB Evolved from a RecA Duplication
ylate typically found in the Walker B motif of several polymer and is thus distant from the (presumed) ATP
largegroupsofATPases,forexample,theAAA+classof andDNAbindingsites(Storyetal.1993).
chaperone-like ATPases (Neuwald et al. 1999) and su- ThemostconservedRecAresidueinmotif5(Gln-
perfamily I and II helicases (Gorbalenya and Koonin 194)isfoundatthecarboxy-terminalendofstrand5.
1993), is replaced by an alcohol residue in the RecA/ In the structure, this residue is adjacent to the ATP
DnaBsuperfamily(Fig.1). g-phosphate and it has been proposed to mediate a
Motif 3 corresponds to E. coli RecA strand 2 and structural change on binding of ATP that stabilizes a
thefollowingloopandischaracterizedbyacompletely conformation in the following loop 2 and/or helix G
conservedglutamate(hhh[SD].E)thathasearlierbeen with high affinity for DNA (Story and Steitz 1992).
described as a conserved feature of the DnaB family Similarly, the corresponding residue of phage T7 gp4
(Ilyina et al. 1992). The conserved glutamate is as- (His-465) is in a position to act as g-phosphate sensor
sumedtoactivatethenucleophilicwatermoleculefor orconformationalswitchbyformingahydrogenbond
an in-line attack of the ATP g-phosphate (Story and withtheATPg-phosphate(Sawayaetal.1999).Inad-
Steitz 1992), and a E96D mutation in E. coli RecA re- ditiontotheconservationoftheputativeg-phosphate
sultsina100-foldreductionintheATPhydrolysisrate sensoritself(glutamineinallbacterialDnaBsandhis-
(CampbellandDavis1999a,b).Thecatalyticglutamate tidine in the eukaryotic DnaB homologs, phage T7
is highly conserved not only in the entire RecA/DnaB gp4,phageT4UvsX,andtheSmsfamily),considerable
superfamily, but it is found in the same location (car- sequence conservation is also found in the preceding
boxy-terminalofthestrandthatfollowstheP-loop)in helixFandstrand5inallmembersoftheRecA/DnaB
a large number of Walker-type ATPases, for example, superfamily (Fig. 1). This suggests that the general
F0/F1 ATPases and Rho helicase (Yoshida and Amano mode of ATP-binding/hydrolysis-mediated conforma-
1995).Interestingly,however,thismotifisnotdetect- tional change is conserved at least between RecA,
able in NTPases, for example, the AAA+ class and the RadA/DMC1, and DnaB. Whether that holds true for
superfamily1and2helicases,wheretheconservedas- theentiresuperfamilyisdoubtfulbecausetheputative
partate in the Walker B motif (motif 4) is followed by g sensor (His-465/Gln-194) is not conserved in the
another negatively charged residue (so-called DEXX double-domainKaiCproteinsandbecausetheloopbe-
box).Astheconservedaspartateinmotif4isfollowed tween motifs 5 and 6 (loop 2) seems to be missing in
bynonchargedresidueintheRecA/DnaBsuperfamily, KaiCandSms(Fig.1).
it has been suggested that the second charged residue Inadditiontomediatingaconformationalchange
of the Walker B motif is functionally replaced by the within a subunit, binding and hydrolysis of ATP is
conserved glutamate in motif 3 in the RecA/DnaB su- likelytoinducetherotationofsubunitswithintheT7
perfamily(Sawayaetal.1999). gp4 hexamer (Sawaya et al. 1999). It has been sug-
In addition to the catalytic glutamate in motif 3 gested that T7 gp4 residue Arg-522, which is close to
andtheWalkerAandBmotifs(motifs2and4)thatare theg-phosphateofaboundATPinaneighboringsub-
foundinawidevarietyofATPases,therearefourother unit,isresponsibleforcouplingATPhydrolysistosub-
motifs (1, 5, 6, and 7 in Fig. 1) that show significant unit rotation (Sawaya et al. 1999). The importance of
sequence conservation among the members of the the residue is underscored by the fact that Arg-522 is
RecA/DnaB superfamily and that can be correlated the third residue of a [KR].[KR] motif located between
with elements known from the crystal structure of strands 7 and 8 that is completely conserved in the
RecA and T7 gp4 (Story and Steitz 1992; Story et al. DnaB, RecA, Sms, and KaiC families (Fig. 1). Surpris-
1992;Sawayaetal.1999)(Figs.1and2). ingly,the[KR].KR]motifappearstobemissinginthe
Motif 1 is amino-terminal of the P-loop and cor- archaeoeukaryotic RadA/DMC1 family (Fig. 1) al-
respondstohelixBandaglycine-richloopcontaining thoughRadA/DMC1sharesthestrandexchangefunc-
a conserved negative charge with the consensus pat- tionwithRecAandsharesthehighestoverallsequence
ternh.[ST]G...h[DE]...G(wherehstandsforahydro- similaritywithRecAwithintheRecA/DnaBsuperfam-
phobicresidue,residuesinsquarebracketsarealterna- ily. There is a conserved positively charged residue
tives,andadotstandsforanyresidue).InE.coliRecA, nearby in the predicted strand 7 of the RadA/DMC1
thetightturncompletedbyhelixBandtheneighbor- familyproteins(Fig.1),butwhetherornotthisresidue
ing carboxy- and amino-terminal sequences is stabi- isfunctionallyequivalenttoArg-522willhavetoawait
lized by hydrogen bonds between Thr-42 and Asp-48 thefirststructureofamemberofthisfamily.
side chains and Asp-48 and Gly-54 backbone atoms In T7 gp4, the base of the bound nucleotide is
(Story et al. 1993); all four residues involved in these sandwiched between Arg-504 and Tyr-535 (Sawaya et
interactions are highly conserved within the entire al. 1999). Arg-504, at the carboxy-terminal end of
RecA/DnaB superfamily (Fig. 1). No function has yet strand6inmotif6(Fig.1),isconservedaseitherArgor
been assigned to motif 1, but it has been noted that Lys in DnaB and RecA but not in most KaiC and Sms
this regions points towards the outside of the RecA proteins.T7gp4Tyr-535,atthecarboxy-terminalend
Genome Research 11
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Leipe et al.
ofstrand8inmotif7,seemsconservedasanaromatic the KaiC family is the most difficult to interpret. The
residue (Phe, Tyr, His) within the DnaB family al- geneseeminglyhasundergonemultiplegeneduplica-
thoughexactsuperpositionwouldrequireagapinthe tions and lateral transfers. The typical KaiC protein
bacterial DnaB sequences (Fig. 1). In E. coli RecA, the composedoftwoRecA-likedomainsjoinedheadtotail
base of the bound ADP stacks on Tyr-103 (Story and isfoundintheCyanobacteriaandtheArchaeaArcheo-
Steitz 1992), which is a residue carboxyl terminus of globus, Pyrococcus, and Methanobacterium (Fig.
motif3thatseemsconservedonlyinRecAbutnotin 3),whereasitisabsentfromMethanococcusandAeropy-
anyoftheothermemberoftheRecA/DnaBsuperfam- rum.Asanadditionalcomplication,theMethanobacte-
ily (Fig. 1). The other residues that are close to the riumKaiCismorecloselyrelatedtooneoftheSynecho-
adeninebaseintheE.coliRecAstructureareAsp-100, cystis KaiC paralogs than to the double-domain KaiC
Tyr-264,andGly-265(StoryandSteitz1992).Interest- foundinotherArchaealikeArcheoglobusandPyrococcus
ingly,E.coliRecATyr-264isconservedasanaromatic KaiC (Fig. 3). In addition to the double-domain KaiC
residueintheRecAfamilyandlocatedatthecarboxy- proteins,thereisalargenumberofsingle-domainKaiC
terminal end of strand 8 similar (but seemingly not homologsthatareallarchaealwiththeexceptionofan
identical) to the position of T7 gp4 Tyr-535. A con- apparent recent transfer into the hyperthermophilic
served aromatic residue close to the carboxy-terminal bacteriumThermotogamaritima(Fig.3).Indeed,whole-
endofstrand8isalsopresentintheRadA/DMC1,Sms, genomeanalysishasshownthatalmostaquarterofall
and KaiC families, but they do not seem to align ex- T.maritimagenesarelikelyacquiredbylateraltransfer
actlywiththearomaticresiduesineitherRecAorgp4/ fromtheArchaea(LogsdonandFanny1999;Nelsonet
DnaB(Fig.1).Thelackofexactsuperpositioncouldbe al. 1999). The KaiC family as a whole seems to origi-
caused by a suboptimal alignment or, alternatively, nate from the bacterial side of the RecA/DnaB super-
might indicate that the spatial orientation of the family and is identified as a sister group to the Sms
nucleoside with respect to the phosphate moiety dif- family with varying statistical support in most phylo-
fers between the various members of the RecA/DnaB genetic analyses (results not shown). We hypothesize
superfamily. that the ancestral KaiC was a single-domain protein
Similarities between DnaB and RecA can also be that has been laterally transferred from the Bacteria
foundinthesubunitinterface.Hexamerformationin intotheArchaeaandthatthetwo-domainKaiCorigi-
T7gp4dependsonhelixAthatislocatedattheamino nated by gene duplication and fusion within the Ar-
terminusofthehelicasedomain(Sawayaetal.1999).It chaea. In this model, the occurrence of the double-
protrudesfromtherestofthemoleculeandcompletes domainKaiCintheCyanobacteriaanditslackinother
a three-helix bundle (helices D1, D2, and D3) on a Bacteria is interpreted as a secondary lateral transfer
neighboringsubunit(Sawayaetal.1999).Similarly,in fromtheArchaeaafterthemainbacteriallineageshad
the RecA polymer, large parts of the subunit interface beenestablished.
are formed by a protruding amino-terminal helix A
(Fig. 2) and strand 0 of one subunit packing against EvolutionoftheEukaryoticDnaBProteins
strand3andhelixEinaneighboringsubunit(Storyet TherearetwotypesofDnaBproteinsintheEukaryota.
al. 1992). Thus, although no sequence similarity has TheDnaBsequencesfoundinchloroplastgenomesare
beendetectedineithertheprotrudingamino-terminal highlysimilartothebacterialsequencesandthechlo-
helixAortheotherinterfacehalfaroundhelixD,the roplast DnaB of the red algae Porphyra also shares the
structuralsimilaritiessuggestthatthesubunitinterface intein position with Cyanobacteria and a few other
ishomologousandwasalreadypresentinthecommon bacteria(Pietrokovski1996)(Fig.1).Thereistherefore
ancestor of DnaB and RecA. In contrast, the amino littledoubtthattheseproteinsareverticallyinherited
terminusofKaiCislocatedimmediatelybeforemotif1 from the bacterial endosymbiont that gave rise to the
(Fig. 1) and a protruding helix is likely absent. It is plastids and that they are likely the functional heli-
thereforeunlikelythattheKaiCproteinshavetheabil- cases in chloroplast DNA replication. In contrast, the
ity to hexamerize and the head-to-tail fusion of two previously undetected nuclear eukaryotic DnaB ho-
RecA-like ATPase domain in the two-domain KaiC mologs tend to group with the T-odd bacteriophage
genes suggests that they might function as dimers. proteins (gp4) in which the DnaB helicase domain is
Similarly,theamino-terminalregionofSmsproteinsis fusedtoaDnaG-typeprimasedomain,althoughthere
taken up by the Zn-binding module, which might be is no strong statistical support for this clade (Fig. 3).
analternativemeansofdimerizationbutalsocouldbe Also,whenthenucleareukaryoticDnaBsequencesare
aDNA-bindingdomain. used as queries for database searches, they typically
showthegreatestsimilaritytothebacteriophageDnaB
homologs (data not shown). Furthermore, the DnaB
EvolutionoftheKaiCFamily homologfromArabidopsishasthesamedomainarchi-
Amongtheproteinsconsideredhere,theevolutionof tecture as the phage homologs, with the primase do-
12 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
DnaB Evolved from a RecA Duplication
probably has been horizontally transferred
into eukaryotes via a bacteriophage. Subse-
quent evolution of this gene in eukaryotes
seemedtohaveinvolveddegradationofthe
primase domain, at least in some lineages,
whereas the helicase domain remained in-
tact. The unexpected tree topology for the
eukaryotic DnaB homologs, namely the
strongly supported grouping of the Plasmo-
dium protein with the human one and the
lack of statistically significant grouping of
theplantproteinwiththerestoftheeukary-
otes,suggestacomplexevolutionaryhistory
of this gene, perhaps involving additional
horizontal transfer events. The functions of
the nuclear eukaryotic DnaB homologs re-
main unclear. The plant and animal DnaB
homologscontainaamonia-terminalexten-
sionthatislikelytofunctionasanorganel-
larimportpeptide;thus,aroleinmitochon-
drialDNAreplicationorrepairseemsapos-
sibility.ThispossibleuseofthephageDnaB
for organellar function is reminiscent of a
similar adaptation of a T-odd phage RNA
polymerase in organellar transcription in
plants(Hedtkeetal.1997).
Evolution of the RecA/DnaBSuperfamily
The sequence similarity between DnaB and
RecA and their shared ability to form hexa-
Figure 3 Unrooted phylogeny of the RecA/DnaB superfamily. The analysis is
basedonanthealignmentoftheRecA/DnaBcoredomainshowninFig.1.The meric rings or helices of similar quaternary
data matrix contains 221 residues seven of which are invariant or parsimony structure (Ogawa et al. 1993; Yu and Egel-
uninformative.Supportforindividualbranchesisindicatedbybootstrapvalues
man 1993, 1977; Yu et al. 1996; Seitz et al.
for 1000 resampling of PAUP maximum parsimony (first number), PHYLIP dis-
tance analysis (second number), and the reliability value computed by the 1998) raise the question of whether the
PUZZLEsoftware(thirdnumber).Bootstrapvalues<50%arenotrecordedand RecA/DnaB superfamily is related to other
brancheswithoutbootstrapnumbersarederivedfromadistancetreecomputed hexameric P-loop NTPases. There is no evi-
withthePHYLIPprogramsprotdistandfitch.Branchlengthsarearbitraryanddo
dence of a specific relationship with the
not represent evolutionary distances. The two possible positions of the root as
discussed in the text are indicated by black arrows. (Red) Eukaryota; (green) hexameric/dodecameric branch-migration
Archaea; (blue) Bacteria; (pink) Bacteriophages. Names in boxes identify the helicase RuvB (Mitchell and West 1994) or
individual protein families. The sequence identifiers are the same as for Fig. 1
SV40 large T antigen helicase (Mastrangelo
exceptthattheGenBankidentifierwasomitted.
et al. 1989; Weisshart et al. 1999) both of
main located upstream of the DnaB domain and con- which belong to the AAA+ class, a distinct division of
taining all the diagnostic sequence motifs of the P-loop NTPases (Neuwald et al. 1999; L. Aravind and
ToprimdomainsoftheDnaG-typeprimases(Ilyinaet E.V. Koonin, unpubl.). In contrast, there are distinct
al. 1992; Aravind et al. 1998) (data not shown). The similarities between the RecA/DnaB superfamily and
DnaBhomologfromthenematodeC.elegansseemsto the family of ATPases that includes transcription ter-
contain a diverged counterpart of the DnaG domain mination factor Rho and F –ATPase (Dombroski and
1
with disrupted catalytic motifs, and no trace of the Platt 1988; Gorbalenya and Koonin 1993; Miwa et al.
DnaGdomaincouldbedetectedinthehomologfrom 1995; Washington et al. 1996). Within the core do-
Plasmodium (the human coding sequence is incom- mainofRecAandF –ATPase(correspondingtostrands
1
pleteanditremainsunclearwhetherornottheprotein 1–8ofRecAandtheassociatedhelicesandloops),~ 130
contains a DnaG domain). This conservation of a residues can be superimposed with a Rmsd of <2.0 Å
unique domain architecture between nuclear eukary- (Abrahams et al. 1994) and secondary structure ele-
oticandbacteriophageDnaBhomologs,togetherwith ments also are largely congruent (Washington et al.
the apparent absence of DnaB homologs in Archaea, 1996).AlthoughthisleaveslittledoubtthattheRecA/
suggests that the gene coding for the DnaB homolog DnaBsuperfamilyandtheRho/F familyshareacom-
1
Genome Research 13
www.genome.org
Downloaded from genome.cshlp.org on March 6, 2023 - Published by Cold Spring Harbor Laboratory Press
Leipe et al.
mon ancestor that already had a hexameric quarter- might have been horizontally transferred into the eu-
narystructure,italsoindicatesthathexamericNTPases karyoticlineageandisunlikelytoplayacriticalrolein
as a whole (including RecA/DnaB, Rho/F , and the eukaryotic nuclear DNA replication given its absence
1
AAA+class)arenotamonophyleticgroup. in yeast. Instead, the eukaryotic DnaB homologs are
Phylogeneticanalysisbasedonthemultiplealign- likely to function in organelles. These findings have
ment of the core RecA/DnaB domain (~ 250 residues) consequences for our understanding of the evolution
stronglysupportsthemonophylyofsixmajorgroups, of DNA replication. Given the involvement of RecA/
namely bacterial and chloroplast DnaB, eukaryotic DMC1/RadA in recombinational processes in all do-
DnaBhomologs(withtheexceptionoftheplantone), mainsoflife,itseemslikelythatthisparticularfamily
bacterial Sms, KaiC, bacterial RecA, and the archaeal/ wasalreadyrepresentedintheLCAofallextantcellu-
eukaryoticRad51/DMC1/RadA(Fig.3).Themostcriti- larorganisms.Incontrast,DnaB,whichistheprincipal
cal factor in interpreting this tree is the placement of helicaseinvolvedinbacterialDNAreplication,hasap-
theroot.Unambiguousrootingispossibleonlywhena parently been recruited for this function after the di-
reliabletreecanbeproducedfortwoparalogousfami- vergence of bacteria from the archaeal/eukaryotic lin-
lies resulting from a duplication known to be present eage. Given that any replicative helicase has to be a
in the last common ancestor (Gogarten et al. 1989; highly processive enzyme, the ability of RecA to form
Iwabe et al. 1989; Brown and Doolittle 1995). To that hexameric rings (with the right diameter to encircle
end, we have used the Rho/F ATPase family as the DNA)offersanexplanationwhyaRecAderivativewas
1
paralogous group for the entire RecA/DnaB superfam- asuitablecandidatetobeselectedastheprincipalhe-
ily.However,theinformationcontainedintheoverall licase for bacterial DNA replication. Conversely, eu-
alignmentwasinsufficienttoobtainareliablerooting karyotic replicative helicases might have been inde-
(datanotshown).Thus,thetopologyofthetreeallows pendentlyrecruitedfromotherclassesofATPases,such
for two principal, competing interpretations (Fig. 3). as the AAA+ class or the superfamily II helicases. The
PlacingtherootbetweentheRecA/Rad51/DMC1/RadA notionthatthereplicativeDNAhelicaseoftheBacteria
recombinases and the predominately bacterial assem- isnotanorthologofthecorrespondingreplicativehe-
blageofSms,DnaB,andKaiCsuggestsanevolutionary licases in Archaea and Eukaryota is compatible with
scenario in which a gene duplication in the LCA pro- the recently discussed hypothesis that the modern-
ducedtheancestorofDnaB/Sms/KaiContheonehand typesystemforthereplicationofdsDNAhasevolved
and the RecA/Rad51/RadA recombinases on the other independentlyinthebacterialandarchaeal/eukaryotic
hand,andalatergeneduplicationinthebacteriallin- lineages(Leipeetal.1999).
eage gave rise to DnaB and Sms. Consequently, the
model has to assume that the ancestor of DnaB/Sms/ Methods
KaiC has been secondarily lost from the archaeoeu-
ThenonredundantdatabaseofproteinsequencesattheNCBI
karyotic lineage. Alternatively, the root can be placed
(NR) was searched using the gapped BLASTP and PSI-BLAST
between the archaeoeukaryotic proteins (Rad51/ programs (Altschul et al. 1997). Briefly, the PSI-BLAST pro-
DMC1/RadA) and the bacterial families (RecA/Sms/ gramconstructsaposition-dependentweightmatrix(profile)
DnaB/KaiC) (Fig. 3). In this scenario, the RecA/DnaB using multiple alignments generated from the BLAST hits
superfamilyevolvedfromasinglegeneintheLCAand above a certain expectation value (E value) and carries out
iterative database searches using the information derived
the bacterial subfamilies, namely RecA, DnaB, Sms
from the profile. The statistical evaluation of the PSI-BLAST
(and possibly KaiC), are derived from successive gene
results is based on the extreme value distribution statistics
duplication events within the bacterial lineage. The
originally developed by Karlin and Altschul (1990) for local
dataavailabledonotallowustodistinguishwithcer- alignmentswithoutgapsandsubsequentlyshownbyexten-
tainty between these two scenarios, but we favor the sivecomputersimulationstoapplyalsotogappedalignments
rootingbetweenRad51/DMC1/RadAandRecAbecause and to alignments obtained by using profiles (Altschul and
it is the more parsimonious alternative that does not Gish1996;Altschuletal.1997).Ithasbeenemphasizedthat
E values reported for each retrieved sequence at the point
invokeasecondarygeneloss.
whenitsalignmentwiththequerysequencepassesthecutoff
for the first time are robust estimates of statistical signifi-
cance.Onceasequencegetsincludedintheprofile,Evalues
Conclusions
reportedforitanditsclosehomologsatsubsequentiterations
We show here that the DnaB and RecA/DMC1/RadA become inflated and do not represent the statistical signifi-
proteinsformadistinctsuperfamilyofstructurallyand cance(AltschulandKoonin1998).HereweonlyreportEval-
uesforthefirstappearanceofthegivensequenceabovethe
evolutionarily related ATPases. Additionally, we de-
cutoff. The dbEST was searched using the gapped TBLASTN
scribe previously undetected DnaB homologs from
program(Altschuletal.1997).
phylogeneticallydivergenteukaryotes.Theeukaryotic
Multiple sequence alignments were constructed using
DnaBhomologthatsharesacommondomainorgani- thePSI-BLASToutputandmodifiedmanuallyonthebasisof
zation with T-odd bacteriophage primases–helicases structuralconsiderations.Thealignmentswereformattedus-
14 Genome Research
www.genome.org
Description:Detlef D. Leipe,1 L. Aravind,2,3 Nick V. Grishin,1,4 and Eugene V. Koonin1,5. 1National . DNA-binding site; (2) DnaB likely evolved from RecA.