Table Of ContentGBE
Analyses of Charophyte Chloroplast Genomes Help
CharacterizetheAncestralChloroplastGenomeofLandPlants
Peter Civa´nˇ1, Peter G. Foster2, Martin T. Embley3, Ana Se´neca4,5, and Cymon J. Cox1,*
1CentrodeCieˆnciasdoMar,UniversidadedoAlgarve,Faro,Portugal
2DepartmentofLifeSciences,NaturalHistoryMuseum,London,UnitedKingdom
3InstituteforCellandMolecularBiosciences,UniversityofNewcastle,NewcastleuponTyne,UnitedKingdom
4DepartmentofBiology,FaculdadedeCieˆnciasdaUniversidadedoPorto,Porto,Portugal
D
5DepartmentofBiology,NorgesTeknisk-NaturvitenskapeligeUniversitet,Trondheim,Norway o
w
n
*Correspondingauthor:E-mail:[email protected]. lo
a
d
Accepted:March 23, 2014 e
d
Datadeposition:Thechloroplastgenomesequences reportedinthisarticlehavebeendepositedinGenBankundertheaccessionsKlebsormidium fro
m
flaccidumKJ461680,MesotaeniumendlicherianumKJ461681,andRoyaanglicaKJ461682. h
ttp
s
://a
c
Abstract ad
e
m
Despitethesignificanceoftherelationshipsbetweenembryophytesandtheircharophytealgalancestorsindecipheringtheoriginand ic
.o
evolutionarysuccessoflandplants,fewchloroplastgenomesofthecharophytealgaehavebeenreconstructedtodate.Here,we u
p
presentnewdataforthreechloroplastgenomesofthefreshwatercharophytesKlebsormidiumflaccidum(Klebsormidiophyceae), .co
m
Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of /g
b
Klebsormidiumhasaquadripartiteorganizationwithexceptionallylargeinvertedrepeat(IR)regionsand,uniquelyamongstrepto- e
/a
phytes,haslosttherrn5andrrn4.5genesfromtheribosomalRNA(rRNA)geneclusteroperon.ThechloroplastgenomeofRoyadiffers rtic
fromotherzygnematophyceanchloroplasts,includingthenewlysequencedMesotaenium,byhavingaquadripartitestructurethatis le
-a
typicalofotherstreptophytes.OnthebasisoftheimprobabilityofthenovelgainofIRregions,weinferthatthequadripartitestructure b
s
haslikelybeenlostindependentlyinatleastthreezygnematophyceanlineages,althoughtheabsenceoftheusualrRNAoperonic tra
c
syntenyintheIRregionsofRoyamayindicatetheirdenovoorigin.Significantly,allzygnematophyceanchloroplastgenomeshave t/6
/4
undergonesubstantialgenomicrearrangement,whichmaybetheresultofancientretroelementactivityevidencedbythepresence /8
9
ofintegrase-likeandreversetranscriptase-likeelementsintheRoyachloroplastgenome.Ourresultscorroboratetheclosephyloge- 7
/5
neticrelationshipbetweenZygnematophyceaeandlandplantsandidentify89protein-codinggenesand22intronspresentinthe 4
3
5
chloroplastgenomeatthetimeoftheevolutionarytransitionofplantstoland,allofwhichcanbefoundinthechloroplastgenomesof 4
9
extantcharophytes. b
y
g
Key words: charophytes, bryophytes, land plants, chloroplast genomics. u
e
s
t o
n
1
Introduction by plants and the fundamental role of plants in Earth’s eco- 1 A
p
Itisnowestablishedthatlandplantsevolvedfromfreshwater systems,thecharacterizationoftheancestorofembryophytes ril 2
greenalgalancestorsofthecharophytealgae(McCourt1995; has long been of special interest to evolutionary biologists. 01
9
Karol et al. 2001; Wodniok et al. 2011). The transition of Fromthecytological,physiological,andbiochemicalperspec-
plants from an aquatic to the terrestrial environment is tive,itisevidentthatsomeofthefeaturestypicallyassociated
thought to have occurred about 425–490Ma (Sanderson withlandplantshavetheirmolecularoriginsinthepreterres-
2003; Wellman et al. 2003; Gensel 2008; Rubinstein et al. trial era. Such features include multicellularity and three-
2010) and was followed by a rapid diversification of plant dimensional growth, cellulosic cell walls, phragmoplast
lineagesthatresultedindramaticchangestotheEarth’sbio- formationduringcelldivision,orintercellularcommunication
sphere(KenrickandCrane1997;Lentonetal. 2012).Given mediated by plasmodesmata, and plant hormones (Leliaert
thegreatevolutionarysignificanceofthecolonizationofland etal.2012).Althoughthesefeaturesareindeedfundamental
(cid:2)TheAuthor(s)2014.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution.
ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by-nc/3.0/),whichpermits
non-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.Forcommercialre-use,[email protected]
GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 897
GBE
Civa´nˇ etal.
to land plants, some of them involve genes that appear to et al. 2011). Although many of the genes necessary for
have orthologs in charophyte algae (Timme and Delwiche chloroplast-specificfunctionshavebeentransferredtothenu-
2010; De Smet et al. 2011). A better understanding of the cleusandhavetheirproductsimportedintochloroplastsfrom
evolutionofembryophytedesignisthereforedependentupon thecytoplasm,thegenesencodingtransmembranepolypep-
animprovedunderstandingofstreptophyterelationshipsbut tides(subunitsofatp,ndh,pet,psa,andpsbcomplexes)tend
iscurrentlyhinderedbythepaucityofcharophytenuclearand toberetainedbythechloroplastgenome(cpDNA),presum-
organellargenomicdataavailableforstudy. ably because importing the protein products of these genes
Phylogenetically, extant charophyte (Charophyta) line- would be difficult (Wicke et al. 2011). Other plastid genes
ages form a paraphyletic assemblage with the land exhibit high expression levels at early developmental stages
plants (Embryophyta) and are together classified as (e.g., genes for structural RNAs, ribosomal proteins, and
the Streptophyta. However, elucidation of phylogenetic RNA polymerase), which likely favor their localization in the D
o
relationships among the charophyte groups, chloroplast rather than the nucleus (Wicke et al. 2011). w
n
namely, Chlorokybophyceae, Mesostigmatophyceae, Astablegenecontentofthechloroplastgenomeisaccompa- loa
d
Klebsormidiophyceae, Zygnematophyceae, Charophyceae, niedbyaconservedstructuralorganizationofitscircularmap e
d
and Coleochaetophyceae, with respect to the land plant whereby two inverted repeats (IRs) are separated by a large fro
m
clade, has been controversial. Early phylogenetic studies ap- single-copy(LSC)regionandasmallsingle-copy(SSC)region. h
pearedtoprovideevidenceforanintuitivelyelegantprogres- Asthisquadripartitearchitecturelikelyconfersphysicalresis- ttp
s
sion of increasing morphological complexity from single-cell tancetorecombinationallosses(PalmerandThompson1981), ://a
organismsoftheChlorokybophyceae,Mesostigmatophyceae, structuralchangestochloroplastgenomesareinfrequent,and ca
d
andKlebsormidiophyceae,throughthemulticellular,filamen- theiridentificationanddistributioncanbeusedtosupplement em
tous,andthallosestructuredalgaeoftheZygnematophyceae sequence data in the evaluation of phylogenetic hypotheses ic
.o
(conjugatingalgae)andColeochaetophyceae,withthemost (Qiuetal.2006;Turmeletal.2006,2007;Jansenetal.2008; up
complex, and most land plant-like, species of the Greweetal.2013).Althoughgenelossesareoftenhomoplas- .co
m
Charophyceae being most-closely related to the land plants tic(Martinetal.1998),otherrarergenomicchangessuchas /g
b
(Karoletal.2001;McCourtetal.2004).Thissametreetopol- largeinversions,insertion,anddeletionevents(indels),intron e
/a
ogywhereCharophyceaearethesistergrouptolandplants gain and loss, or gene order rearrangements may provide rtic
wasagainobtainedinasix-genephylogeneticanalysisbyQiu reliablephylogeneticinformation(RokasandHolland2000). le
-a
etal.(2006).However,inthesamestudy,genematricesde- The gene complements of land plant chloroplasts do not b
s
rived from complete chloroplast genomes yielded a highly differ substantially from those of charophyte algae (Turmel tra
c
supported monophyletic Zygnematophyceae clade as the et al. 2006; Green 2011; Wicke et al. 2011). Moreover, t/6
/4
sister group to land plants. More recent analyses based on most introns found in embryophyte chloroplast genes are /8
9
chloroplast (Turmel et al. 2006, 2007) and nuclear phyloge- alsopresentincharophytechloroplastsandhadbeenacquired 7
/5
nomicdata(Wodnioketal.2011;Laurin-Lemayetal.2012; before the transition to land (Turmel et al. 2006). However, 43
5
Timmeetal.2012)placeZygnematophyceae,oracladeunit- althoughthechloroplastgeneorderamonglandplantgroups 4
9
ing Zygnematophyceae and Coleochaetophyceae, as closest isfairlystable(fig.2,Wickeetal.2011),dozensofsequence b
y
group to the land plants, whereas mitochondrial gene data inversions separate the known charophyte chloroplast ge- gu
e
sets remain inconclusive (Turmel et al. 2013). Currently, the nomes from one another and from the conserved gene s
t o
best-supportedhypothesisofcharophytebranchingorderhas order found in bryophytes (Turmel et al. 2005, 2006). n
1
a clade uniting Chlorokybus and Mesostigma at the base of Chloroplastgenomerearrangementsareespeciallyabundant 1
A
the streptophyte tree with Klebsormidiophyceae, then in Zygnematophyceae, and it has been suggested that their p
Charophyceae, the next two diverging lineages, respec- highoccurrenceiscausallyrelatedtothelossofquadripartite ril 2
0
tively, with the closest relatives of land plants, either the structure in this class (Turmel et al. 2005). However, a satis- 19
Zygnematophyceae alone or a clade consisting of both factory mechanistic explanation of such causality is lacking
Zygnematophyceae and Coleochaetophyceae (Turmel et al. and a broader examination of the zygnematophycean
2006; Wodniok et al. 2011; Laurin-Lemay et al. 2012; cpDNAarchitecturehasyettobeconducted.
Timmeetal.2012). Here, we report newly sequenced chloroplast genomes
Photosyntheticorganelleshaveaclearfunctionalcontinuity of three charophyte algae, namely, Klebsormidium
spanningthetransitionperiodbetweenaquaticalgalandter- flaccidum(Klebsormidiophyceae),Mesotaeniumendlicherian-
restrialembryophyticlifestyles.Withatypicalgenomesizeof num (Zygnematophyceae), and Roya anglica
between 115 and 170kb and a gene complement of (Zygnematophyceae). Klebsormidium flaccidum is a species
100–120 unique genes (Green 2011; Wicke et al. 2011), from the last taxonomic class of charophyte algae to lack a
thestreptophyteplastidgenerepertoireisrelativelystablebe- completely sequenced chloroplast genome. The two zygne-
cause retention of the core set of chloroplast genes is likely matophyceantaxaarebothsaccodermdesmidsoftheprevi-
understrongselection,andgenegainsareexceptional(Wicke ously unsampled family Mesotaeniaceae and thought to be
898 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
GBE
CharophyteChloroplastGenomes
earlydivergingortransitionalformsofconjugatingalgae.The samples not reported here. The library type for Illumina se-
three genomes aid our understanding of the structural quencingwas91pairedend,withapproximately500bpfrag-
changes that occurred in chloroplasts during the evolution mentsize.
of early-diverging streptophyte clades. Moreover, compari-
sonsof thegeneticcompositionofchloroplastgenomesan-
DataProcessingandAssembly
cestraltoembryophyteswiththoseoftheZygnematophyceae
Roche454pyrosequencingandIlluminashort-readdatawere
reveal several uniquely shared features that corroborate the
imported into Geneious 5.6.3 (Biomatters, http://www.gen
closephylogeneticrelationshipoftheseplantgroups.
eious.com, last accessed April 8, 2014) in sff and fastq for-
mats,respectively.Aftertheremovalofoligonucleotideadap-
Materials and Methods ters, sequences were trimmed from both sides, discarding
D
regions with >4% (sff) or >5% (fastq) chance of an error ow
AlgalCulturesandChloroplastGenomeSequencing n
per base. As the data were from a whole-genome shotgun lo
a
CulturesofK.flaccidum([Ku¨tzing]P.C.Silva,K.R.Mattox,and collectionofsequencesbutonlythechloroplastfractionwas de
d
W.H.Blackwell,1972)andM.endlicherianum(Na¨geli,1849) ofinterest,theassemblyofthechloroplastgenomeswasun- fro
wereobtainedfromtheSAGCultureCollectionofAlgae(ac- dertakeninthreestages.1)Foreachtaxon,areferencewas m
cessionnumbersSAG121.80andSAG12.97,respectively)and chosenfromthesetofknownchloroplastgenomesofstrep- http
R. anglica (G.S. West in W.J. Hodgetts, 1920) (accession tophyticalgae.Fromeachreferencegenome,protein-coding s://a
number ACOI799)from AlgotecadeCoimbra(hereafterwe geneswereextractedandusedastemplatesformappingof c
a
refer to the samples as “Klebsormidium,” “Mesotaenium,” thesequencereadsinGeneious.Thisreference-guidedrecon- de
m
and “Roya,” for brevity). Klebsormidium and Mesotaenium structiontypicallyyieldedasetofshort(0.1–1kb),high-con- ic
cells were inoculated on Petri dishes with 1.5(cid:2) Bold’s basal fidence chloroplast contigs representing <10% of the .ou
p
medium (Andersen et al. 2005) supplemented with agar genome. 2) The full paired-read data sets were used for fo- .c
o
(1.5%,w/v)andcultivatedfor10–14daysinagrowthcham- cused assembly by PRICE (Paired-Read Iterative Contig m/g
berunder14h:10hlight:darkregime(100–120mmols(cid:3)1m(cid:3)2 Extension,version0.18;Rubyetal.2013),utilizingtheshort be
irradiation). Roya was grown in a liquid mixture of LC chloroplast contigs as initial seeds. In PRICE assemblies, the /artic
(Algoteca de Coimbra, Portugal) and Bold’s basal medium minimaloverlapwassetto30,andtheminimalpercentiden- le
(1:1)underthesamelightconditionsasabove.Afterapprox- tityto95and85forIlluminaand454datasets,respectively. -ab
s
imately 1 month, the culture of Roya was passed through a For454sequencereads,the-spfargumentwasusedtocreate tra
c
20–25mm filter paper, the cells collected on the filter were falsepaired-enddatafile.Variabletrimmingandfilteringop- t/6
rinsed with sterile 0.5(cid:2) medium, and used for DNA tions were applied. 3) Resulting contigs usually representing /4/8
extraction. the whole reconstructed genome were imported back to 97
/5
Approximately 1g of cells was harvested for each taxon. Geneious, where the sequence reads were remapped onto 4
3
The samples were briefly deep frozen in liquid nitrogen and the contigs. This third stage enabled the sequence coverage 54
9
usedforDNAextractionwithoutanyfurthermechanicalcell and base-assignment confidence to be evaluated, and the b
y
breaking. The frozen cells were resuspended in 5–10ml of identification and adjustment of ambiguous sites and re- g
u
e
2ex%traCcTtiAonB[bwu/fvf]e;r0(.03.%1M2-MTrEis;;02.01mmgMmNl(cid:3)a12ERDNTaAse; A1.;4pMH~N8aC.5l;) peaStpeedciraelgaiottnesn.tionwaspaidtothereconstructionof the IR st on
and incubated for 1h at 65(cid:4)C with occasional vortexing. regions of the chloroplast genomes. In a standard PRICE as- 11
A
Subsequently, the tubes were chilled on ice, the DNA was sembly of a quadripartite-structure chloroplast genome, one p
extracted with equal volume of chloroform:isoamylalcohol of the following problems may occur: two IRs are collapsed ril 2
0
(24:1) and precipitated with isopropanol for 1h at (cid:3)20(cid:4)C. intoasinglecontig;extensionofthesecondIRstopsdueto 19
The precipitate was collected and rinsed with wash buffer readsmappingtoanexistingcontig;andIRsandsingle-copy
(70% ethanol; 0.12M sodium acetate) and 70% ethanol. regions are joined incorrectly. To overcome these issues, a
The pellet was dissolved in TE overnight, and the DNA was simplestrategywasapplied.AfteranIRwasidentifiedinpre-
purified with High Pure PCR Product Purification Kit (Roche) liminaryrunsof2)–3)assemblysteps,a“dead”IRcontigwas
according to the manufacturer’s instructions. Quality of the preparedandaddedtotheinitialseedsforanother2)–3)as-
DNAwascheckedonanagarosegel,andDNAquantityand semblyrun.The“dead”IRconsistedofanIRregionextended
purity weredeterminedby nanodrop.Mesotaeniumwasse- for approximately 500 cytosines on both ends, which effec-
quenced on ½ picotiter plate with GS FLX Titanium (IGSP tivelyexcludesthiscontig,aswellasalltheIR-mappingreads,
Genome Sequencing & Analysis Core Resource, Duke from the PRICE assembly process. The remaining seeds are
University), whereas Klebsormidium and Roya were se- extended until the completion of SC regions, which contain
quenced on a single lane of Illumina HiSeq2000 (BGI Tech short overlaps with the IRs, enabling the four regions to be
SolutionsCo.Ltd,HongKong,China)alongwithfourother joinedcorrectlyintoacircle.
GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 899
GBE
Civa´nˇ etal.
AnnotationandAnalysesoftheChloroplast PhylogeneticAnalyses
GenomeContent
Phylogeneticanalysesof83protein-codingchloroplastgenes
ThesoftwareDOGMA(Wymanetal.2004)wasusedforini- fromthenewlyassembledgenomesofRoya,Mesotaenium,
tialgeneannotations.Thereafter,athoroughexaminationof and Klebsormidium, plus 23 streptophytes and four chloro-
protein-codinggenecontentwasperformedasfollows.Open phyte outgroup taxa, were conducted. Maximum likelihood
readingframes(ORFs)intheassembledgenomeswereiden- and Bayesian Markov chain Monte Carlo (MCMC) analyses
tifiedbygetorf(partoftheEMBOSSsuite:minimallength30 were conducted using among-site (PhyloBayes CAT model;
nucleotides, translations from start to stop codon retrieved), Lartillot and Philippe 2004) and among-lineage (P4 NDCH
andBLASTp(Altschuletal.1990)wasusedtodetectsimilar- model; Foster 2004) composition models to determine the
ities with a National Center for Biotechnology Information best-fitting models and the best-supported trees. Details of
D
(NCBI)ReferenceSequence(refseq)libraryofallknownchlo- these analyses are presented elsewhere (Civa´nˇ et al. unpub- o
w
roplastproteins(downloadedinOctober2012).Afterthean- lished); the best-fitting PhyloBayes CAT-model using the nlo
notationofknownproteins,wefurtherexaminedlongerORFs gcpREVexchangeratemodel(CoxandFoster2013)analysis ad
e
ftrhoemsecroengsiopnicsuohuadslyu“neremppotryte”drehgoimonoslotgos.deTtoerfmacinilietawtehtehtehseer osufpapmoirntoedachidypisopthreesseisnotefdrehlearteioansshairpesfebraesnecdeotrneethfeosrethdeabtae.st- d from
analyses, we built a custom BLAST database of plastid ORFs Analyses of chloroplast genome structural features were h
(determinedbygetorf:minimallength100nucleotides;trans- basedon66parsimonyinformativecharacters:Thepresence ttps
lations from stop to stop codon retrieved) from all available orabsenceof30monocistronicgenesand19groupIIintrons, ://ac
Viridiplantae(chlorophytesandstreptophytes)chloroplastge- plus the gene complement and gene order in 17 operons. ad
e
nomes(downloadedfromNCBIGenBankOctober2012).This tRNA genes and their introns were not considered, except mic
libraryconsistedof1.17millionORFsandwasusedinBLASTp for those tRNA genes located within polycistronic units. .ou
analysestoidentifysequenceswithsimilarity(Evalue<1e-4) Introns were scored as Dollo characters with “absence” as- p.c
o
totheORFsidentifiedfromthe“empty”regionsofthenewly sumed to be the ancestral condition. Dollo character coding m
assembledgenomes.Intronswereidentifiedbycomparisonto correspondstoamodelinwhicheachderivedstateisallowed /gb
e
gpehnyteesa.liEgxnomne–ninttsroonfbootrhdeerrsalwgaeereainndferrreepdrewseitnhtatthiveeabidryoof- ttaokoesrigtinhaetefoornmly oonfceredvuerrsinaglsetvoolutthioen,aanncdestarlallhocomnodpitlaiosny /article
protein alignments and intron border consensus sequences (SwoffordandBegle1993).Theancestralstateof28mono- -ab
(Sugita and Sugiura 1996). To determine the frequency of cistronicprotein-codinggeneswasassumedtobe“presence” stra
withthecharacterstreatedasirreversible,thereforeallowing c
short repeats, one IR was removed from the quadripartite t/6
genomes, and direct and inverted repeats >20bp were multiple losses but no secondary gain of genes. The “pres- /4
searched with a 1e-03 threshold using REPuter (Kurtz et al. ence”or“absence”of77additionalgeneswithin17operons /897
2001),andanaveragenumberofrepeatsperkbwascalcu- was also evaluated (Sugita and Sugiura 1996; Wicke et al. /54
lated.Thenewlyconstructedgenomeswerevisualizedusing 2013—informationregardingtheoperonicorganizationisde- 354
rivedfrommodelangiospermsbutwasadaptedforthegene 9
circulargenomemapscreatedbyOGDraw(Lohseetal.2007). b
Gene contents of the newly reconstructed chloroplast setobservedhere).Operonswerecodedasmultistatecharac- y g
genomeswerecomparedwiththegenomesofotherstrepto- ters defined by step matrices, with unspecified ancestral ues
phytealgaeanda“hypotheticallandplantancestor”(HLPA). states.Inthestepmatrices,everychangeinoperonorganiza- t on
ThegenecontentoftheHLPAunitwasinferredfromaselec- tion was of equal distance except irreversibility of genes lost 11
fromthegenome(i.e.,genelossfromanoperonequalsdis- A
tion of taxa representing all major lineages of land plants p
(the same taxon set as used in the phylogenetic analyses tance1;genegaininanoperonfromanothercpDNAlocation ril 2
equalsdistance1;andgenegaininanoperonfromoutsideof 0
below), assuming monophyly of land plants and only verti- 1
9
cpDNAequalsinfinity).Structuralcharactersweresubjectedto
caltransferofgenes.Genomerearrangementsbetweenchar-
parsimonyanalysisinPAUP4.0(Swofford2003),withoptimal
ophytes and two land plants, namely, Pellia endiviifolia
trees obtained using the branch-and-bound algorithm.
(a liverwort; NC_019628), and Isoetes flaccida (a lycophyte;
Bootstrapanalyseswith1,000replicateswereperformedheu-
NC_014675),weredeterminedusingmultiplegenomerear-
ristically with default parameters. (A NEXUS formatted char-
rangements(MGR:BourqueandPevzner2002)usinganalysis
actermatrixusedforthestructuraldataanalysesisincludedin
thatignoredthetransferRNA(tRNA)genesandoneoftheIR
thesupplementarymaterial,SupplementaryMaterialonline.)
in quadripartite genomes. Because the choice between the
two IR copies is relevant for the gene order analyses, both Results
alternative“single-IR”geneorderswereconsideredforquad-
ChloroplastGenomeAssembly
ripartitegenomes,andthearrangementleadingtothemost
parsimoniousresultwaschosenforpairwisegenomecompar- ForeachofKlebsormidium,Mesotaenium,andRoya,assem-
isonsinMGR. bly of the short-read data yielded a single large contiguous
900 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
GBE
CharophyteChloroplastGenomes
Table1
Summary Statisticsof theGenomeAssembly Data
Platform Numberof MeanRead Proportionof Lengthof Coverage((cid:2))
ReadsObtained Length(After ReadsMapping thecp
(Total) Trimming)(bp) tothecp Genome(bp) Mean Min Max Standard
Genome(%) Deviation
Klebsormidium IlluminaHiSeq2000 46,124,918 86.5 0.66 176,832 152.7 6 235 23.6
flaccidum
Mesotaenium Roche454 689,398 357.1 23.2 142,017 378.9 85 589 72.2
endlicherianum
Royaanglica IlluminaHiSeq2000 54,070,476 86.2 0.78 138,275 273.3 1a 518 105.1
D
o
aThe1(cid:2)coveragewas10-bplongandlocatedwithinanAT-richintergenicregion. w
n
lo
a
d
e
sBeeqcuaeunseceoffohrigwhhiscehquitenwcaesrepaodssicbolevetroagcelso,se15in3t,o37a9c,iracnled. stwtreepetnopthheyteIRs a(anvdersainggele3-c7o%py) rbeugtiodnsiff(e4r6s.0s%ubsatanndtia3l6ly.5b%e-, d from
273 mean reads per site for Klebsormidium, Mesotaenium, respectively). Mean intergenic spacer length was 358bp http
and Roya, respectively, no gaps or ambiguous regions were (52,071bp in total), with two conspicuous exceptions s
present (supplementary fig. S1, Supplementary Material (6,340and4,231bp).Thesetwoextendedintergenicregions ://ac
a
online). Summary statistics of the data and the genome containthreeunidentifiedORFs(6,063,1,785,and1,425bp), d
e
m
assembliesarepresentedintable1. whichhadnostrongmatches(Evalue<1e-4)amongBLASTp ic
searches of the refseq database or the custom ORF .ou
p
library. Group II introns were found in seven genes (table 2) .c
o
Klebsormidiumflaccidum andaccount for 3.7% of thetotal genome length. By com- m
/g
The chloroplast genome of Klebsormidium was assembled parisontothegenomeofChara(Charophyceae)whichhasa be
into a circular map of 176,832bp (fig. 1A; NCBI GenBank larger overall size, the proportion of intergenic spacers and /artic
accession number KJ461680); the third largest among cur- introns is several times lower, indicating that the le
rently sequenced streptophyte chloroplast genomes, smaller large genome size of Klebsormidium can be attributed -ab
s
only than Pelargonium (Geraniaceae, Spermatophyta) and mainlytolargeIRregions. tra
c
Chara (Charales, Charophyceae). The genome has a quadri- t/6
partite organization, which differs from the typical embryo- /4/8
9
phytic architecture by having exceptionally large IRs 7
Mesotaeniumendlicherianum /5
(51,118bp each), a greatly reduced SSC region (1,817bp), 4
3
andarelativelyshorterLSCregion(72,779bp).Theexpanded ThechloroplastgenomeofMesotaeniumwasassembledasa 54
9
IR regions contain both small (rrn16) and large (rrn23) ribo- circular sequence comprising 142,017 bp (fig. 1B, NCBI b
y
somalRNA(rRNA)genes,seventRNAgenestypicallyfoundin GenBankaccessionnumberKJ461681)andlacksaquadripar- g
u
streptophyteIRs,plus23additionalprotein-codinggenestyp- tite structure, as do the two previously published es
icallylocatedinsingle-copyregions(fig.2).Mostremarkably, Zygnematophyceae chloroplast genomes (namely Zygnema t on
the rrn5 gene (5S rRNA) and the region homologous to the and Staurastrum; Turmel et al. 2005). The Mesotaenium 11
A
rrn4.5 gene in embryophytes (4.5S rRNA—in nonembryo- genome contains 88 protein-coding, 4 rRNA, and 34 tRNA p
phyte streptophytes, the rrn4.5 gene-coding region forms genes,andalthoughitis23and15kbshorterthanZygnema ril 2
0
an integral part of the 30-end of the 23S ribosomal subunit) andStaurastrum,respectively,itdoesnotcontainfewergenes 19
are absent from the genome (supplementary fig. S2, (fig. 3). Intergenic spacers occupy almost one-third of the
Supplementary Material online). The SSC region contains genomelength(46,765bp),withameanintergenicdistance
only asinglegene(ccsA), whereas59protein-codinggenes, of357bp.GroupIIintronswerefoundin12genes,withclpP
and 21 tRNA genes, reside in the LSC region. Six ribosomal and ycf3 having two introns each (table 2), and the group I
protein genes (rpl14, rpl16, rpl23, rps3, rps15, and rps16) intron typically found in the streptophyte trnL-UAA gene is
usuallypresentinstreptophytechloroplastgenomesaremiss- present. With an average size of 669bp, introns of
ing, as are several other protein-coding genes (fig. 3). Two Mesotaenium are similar in length to those of bryophytes
genesintheKlebsormidiumgenome,rps12andpsbA,require (713bp)ratherthanthelongerintronsintheothertwozyg-
transsplicing for correct protein translation. In total, genes nematophycean chloroplast genomes (966bp). The overall
codingfortworRNA,28tRNA,and82proteinswereidenti- genome GC content (42%) is notably higher than in the
fiedintheKlebsormidiumchloroplastgenome.TheGCcon- other chloroplast genomes of Zygnematophyceae (32%) or
tentofthegenomeisrelativelyhigh(42%)comparedamong landplants(37%).
GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 901
GBE
Civa´nˇ etal.
D
o
w
n
lo
a
d
e
d
fro
m
h
ttp
s
://a
c
a
d
e
m
ic
.o
u
p
.c
o
m
/g
b
e
/a
rtic
le
-a
b
s
tra
c
t/6
/4
/8
9
7
/5
4
3
5
4
9
b
y
g
u
e
s
t o
n
1
1
A
p
ril 2
0
1
9
FIG.1.—ChloroplastgenomemapsofKlebsormidiumflaccidum(A),Mesotaeniumendlicherianum(B),andRoyaanglica(C).
902 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
GBE
CharophyteChloroplastGenomes
D
o
w
n
lo
a
d
e
d
fro
m
h
ttp
s
://a
c
a
d
e
m
ic
.o
u
p
.c
o
m
/g
b
e
/a
rtic
le
-a
FIG.2.—IRregionsofKlebsormidiumandRoya,incomparisontoChaetosphaeridiumandChara(charophytes),andPellia(abryophyte). bs
tra
c
t/6
/4
/8
9
7
/5
4
3
5
4
9
b
y
g
u
e
s
t o
n
1
1
A
p
ril 2
0
1
9
FIG.3.—ChloroplastgenecontentamongcharophytesandaninferredHLPA.AllrRNAandprotein-codinggenesfoundwithinthesamplesetofthe
phylogeneticanalysesareincluded.Genepresenceandabsenceareindicatedbyblueandorangeshading,respectively.Novelabsencesofgeneswith
respecttoothercharophytegenomesarehighlightedinred.(Notethatthedisambiguationofycf2/ftsHhasbeennewlyinterpreted,seesupplementary
tableS1,SupplementaryMaterialonline.)
GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 903
GBE
Civa´nˇ etal.
1i66fcy – + – + – + + + Royaanglica
2i3fcy – + – + – + + + ThechloroplastgenomeofRoyawasreconstructedasacircu-
larsequenceof138,275bpinlength(fig.1C,NCBIGenBank
1i3fcy – + + + + + – + accession number KJ461682), making it the shortest of the
four zygnematophycean chloroplast genomes sequenced so
1i)CAU(Vnrt – – – – – + + + far (including Mesotaenium here). (One 10-bp region in the
Roya genome hadonly 1Xcoverage: However, as thissmall
1i)UUU(Knrt + – + – + + + + stretchwasinanAT-richintergenicspacerandsurroundedby
well-supportedpairedreads,wedidnotverifytheregionvia
1i)UAG(Inrt + – – – – + + + Sanger sequencing.) Unlike other zygnematophycean, the
D
Roya genome has a quadripartite architecture. The genome o
w
1i)CCU(Gnrt – + + + + + + + sequence consists of SSC and LSC regions (20,213bp and nlo
a
92,926bp, respectively) and a pair of IRs (12,568bp each). d
e
1i)CGU(Anrt + – + – + + + + TheIRsofRoyabearsomeresemblancetoatypicalchloroplast d fro
IR in terms of gene content—all genes of the rRNA operon m
1i61spr – – – + – + + + (rrn16–trnI-GAU–trnA-UGC–rrn23–rrn4.5–rrn5)arepresent— http
although,theintegrityofthisoperonhasbeendisruptedand s
2i21spr – + – + – – – + the genes are merely neighboring units with jumbled order ://ac
a
and orientation (fig. 2 and supplementary fig. S3, d
e
1i21spr + + + + + + + + m
SupplementaryMaterialonline).Atleasttworearrangements ic
wouldbenecessarytorestorethestandardorderoftherRNA .ou
1i1Copr – + – + – – + + p
operon. The IRs of Roya also contain three additional tRNA .c
o
genes(trnR-ACG,trnP-GGG,andtrnL-UAG)andtwolonger m
1i61lpr – + – + + + + + /g
ORFs (orf268 and orf230). The protein translation of orf268 b
e
1i2lpr – + – + – + + + (g8e0n7ebfpro)mhasthheigchhlsoimroiplalraistyt(gEevnaolumee:5oef-a14c)htlooraonphIRy-cloecaanteadlgianet /article
1iAbsp + – – – – – – – Oedogonium cardiacum (Brouard et al. 2008). Because int -abs
encodesaproteinbelongingtothefamilyoftyrosinerecom- tra
c
1iDtep – + + + – – + + binases(Brouardetal. 2008),theproductoforf268wasla- t/6
beledasaputativerecombinase/integraseprotein.Inaddition, /4
/8
1iBtep + + + – – + + + orf268 has high similarity to ORF (46,439–46,717) in the 97/5
Anthoceros (a hornwort) chloroplast genome (E value: 1e- 4
3
1iDhdn + – – – – – – – 13)althoughthelatterisnotlocatedwithintheIRandissig- 54
9
1iBhdn – + – + – + + + merically. nbfryiafimcdaeenstcolyerfns2th3o0wrtie(t6hr,9s3ourbgf2pg6)e8sst.hinoTgwhteshahsteigcithomnsdiamyuilnanoriditteybnettoifiheocdmhloorerlooagpdolianusgst by gues
A 1iAhdn – + – – – + + + dnu ORFs present in two ferns of the Ophioglossaceae, namely t on
ndHLP 3iPplc + – – – – – – – elabele M3ea-1n8kyaunadch5eej-u0e6n,srisesapnedctOivpehlyio).glossumcalifornicum(Evalues: 11 Ap
phytesa 2iPplc – + – – – – – + egenear conTthaeinsi2n8glteR-cNoApygerengeiso,n8s7ofprtohteeinR-ocyoadicnhglogroepnleass,tagnednotmwoe ril 2019
Charo 1iPplc – + – – – + + + hesam aredpdoitritoendalfoOrRoFtshweritZhyghnigehmsaimtoiplahriytcieesateo.Thhyepofitrhsettoicfatlhpersoeteainds-
Distributionamong 1i1iAFmpetac +– +– +– –– +– –– m–– +– intronsoccurringint dSZolotyrictPfCiu3ouis1CnC0ap(pS,l00th6p3Oa6u9RsCFospsoifm,0f5Zioly4aSrg)rtfian2touye4frm5taSo,sattarahuu(Earmpasuvsttasar(liuEutgimevne:ivfi(a2rEcleeuavv-nee0a:tr9lus)es1e,i:emta-r4n1aileadn2-rs)1ictt6yhrai)ep.nttdRoaseTsecslloooa(nccRnuudTdss,)
Table2Intron(GroupII) Klebsormidium Mesotaenium Roya Zygnema Staurastrum Chara Chaetosphaeridiu HLPA N.—MultipleOTE icogtnooertftnne2asgil6c)er,8qaswsupieneiastnhcRteaolyarryseamtonhmecoeacatunypnprisoyneurstgmee3grn0agec%lelseytnirfocoeofftudrnoiRtshdetTael-enliinmkcgeeeecnnhootolformafr23coe91tpi70vl(ai4tbsy1apt.n,:7gTdA0he4neisnobiimtnmp-tliieelkaisrner-,
904 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
GBE
CharophyteChloroplastGenomes
D
o
w
n
lo
a
d
e
d
fro
m
h
ttp
s
://a
c
a
d
e
m
ic
.o
u
p
.c
o
m
/g
b
e
/a
rtic
le
FIG.4.—Phylogeneticanalyses.(A)BayesianMCMCphylogeneticanalysesof83protein-codingchloroplastgenes:PhyloBayesCAT+gcpREV+(cid:2), -ab
marginal likelihood: (cid:3)Lh¼244,645.3855. (B) Strict consensus tree of six most parsimonious trees (length 239, consistency index¼0.243, retention stra
index¼0.786)resultingfromanalysisofthestructuraldata(geneandintroncontent,operonstructure).Numbersatnodesareposteriorprobabilities c
t/6
andnonparametricbootstrapvaluesfor(A)and(B),respectively.ThenodesrepresentingtheHLPAarehighlighted. /4
/8
9
7
genedensitytotheeconomicallypackedchloroplastgenome structural data (gene and intron content, and operon struc- /54
3
of Mesotaenium. However, the overall GC content of the ture) identified six optimal trees (tree length239steps, con- 5
4
9
Roya genome (33%) more closely resembles the base com- sistency index 0.243, retention index 0.786): The strict b
y
position in Zygnema (31%) and Staurastrum (33%) than consensustreeispresentedinfigure4B.Nonparametricpar- g
u
Mesotaenium(42%). simony bootstrap analysis of the structural data poorly sup- e
s
ports a monophyletic Zygnematophyceae (54% bootstrap t o
n
PhylogeneticAnalyses proportion[BP])withstrongsupportforthesister-grouprela- 1
1
tionshipbetweenRoyaandStaurastrum(97%BP).Thestrep- A
In figure 4A, a Bayesian MCMC analysis of the best- p
fitting model (PhyloBayes CAT+gcpREV+(cid:2)4; (cid:3)L ¼244, tophytes as a whole are well supported (98% BP), with ril 2
h 0
645.3855)ofaminoaciddataofthe83proteinsispresented. MesostigmaandChlorokybusformingtheearliest-branching 19
lineage. The remaining streptophytes form a well supported
The tree shows strong support (>0.95 posterior probability)
(100% BP) clade within which Klebsormidium is the first
fortheparaphylyofcharophytes,withKlebsormidiumbranch-
diverging lineage (83%). Relationships among Chara,
ing early in the phylogenetic grade before Chara and
Chaetosphaeridium, Zygnematophyceae, and the land plant
Chaetosphaeridium, and with Zygnematophyceae as the
clade(itself87%BP)areunsupported(ornegligiblysupported
sister group to land plants. Within the Zygnematophyceae,
<70%),butthetopologyisneverthelesscongruentwiththat
all relationships are strongly supported, with Mesotaenium
oftheproteintree.
forming the earliest-branching lineage, and Zygnema sister
to a clade formed by Roya and Staurastrum. This finding is
ComparisonsbetweenCharophyteandLandPlant
in conflict with the traditional placing of Roya in the family
ChloroplastGenomes
Mesotaeniaceaebutisinagreementwithotherphylogenetic
reconstructionsofconjugatingalgae(Gontcharovetal.2003; Theprotein-codinggenecomplementsoftheKlebsormidium,
GontcharovandMelkonian2010).Parsimonyanalysisofthe Roya,andMesotaeniumchloroplastgenomesaresummarized
GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 905
GBE
Civa´nˇ etal.
infigure3.ThechloroplastgenomeofKlebsormidium,witha Chaetosphaeridium, Staurastrum, Mesostigma, and
repertoire of only 82 unique protein-coding genes, has the Chlorokybus (all <1 repeat/kb). A greater number (1.68 re-
lowestprotein-codinggenecontentofanycharophyteplastid peats/kb) were recorded in Mesotaenium; however, the
genomereportedtodate.Hence,theKlebsormidiumchloro- amount is still fewer than in Chara and Zygnema (3.16 and
plast genesetismoredissimilar tothe estimatedcontent of 25.73repeats/kb,respectively).
theHLPA(22presence/absencedifferences)thanarethege-
nomes of Mesostigma or Chlorokybus (18 and 16 presence/
Discussion
absence differences, respectively). In contrast, the taxa with
gene complements most closely resembling the HLPA are NewInsightsintotheChloroplastGenomicsof
Roya and Chaetosphaeridium with eight presence/absence Charophytes
differences each: the other three Zygnematophyceae each D
haveoneadditionaldifference.Comparisonsofthepresence The absence of the 5S rRNA gene from the Klebsormidium ow
n
and absence of group II chloroplast introns show that chloroplastgenome,andtheregionhomologoustothe4.5S loa
rRNAgeneofembryophytes,fromthe30-endofthe23SrRNA d
Chaetosphaeridium is the most similar to the HLPA with 17 e
d
introns at congruent positions (table 2). However, gene,isthefirstreportofanincompletesetofrRNAgenesin fro
eitherchloroplastormitochondrialgenomes;evenwithinthe m
Mcoemsomtaoennipuomsitiisontsheanndexat-lsmooshtassimthilearclwpPit-hint1ro6n-in2trothnastaist greatly reduced chloroplast genomes of parasitic plants, the http
cvioomusmlyobneiennlafnodunpdlainntcchhalororopphlaytsetgaelgnaoem.eWshbeunththaesnoopterpornes- rcRoNmAploepmeeronnt orefmrRaNinAsinsutabcutn(iKtrgauensees20is08as).suBmeceadusveittahletoustuhael s://aca
(polycistronic units) of charophyte chloroplast genomes are assembly and function of ribosomes, it seems likely that the dem
compared with those of land plants, the operonic comple- 4.5Shomologousregionand5SrRNAgeneshavebeentrans- ic
mentsofCharaandChaetosphaeridiumshowgreatersimilar- located to the nuclear genome of Klebsormidium and that .oup
itytotheHLPA(12and13identicaloperons,respectively)than their products are imported into the chloroplast stroma. .co
Nevertheless, multiple losses among eukaryotes of the 5S m
do Zygnematophyceae (11 or fewer identical operons). The /g
operonicorganizationofRoyaisthenext-mostsimilartothe geneinthemitochondrion(AdamsandPalmer2003)suggest be
HLPA (11 concordant operons), whereas the other three thatcompletelossoftheseribosomalsubunitscannotbeen- /artic
Zygnematophyceaebearasfewoperonsofearlylandplants tirelyruledout.IftheassumptionofrRNAtranslocationfrom le
as do more distantly related streptophyte algae, such as the nucleus is correct, chloroplast-directed rRNA import ren- -ab
s
Klebsormidium.Thislackofmaintenanceofoperonicintegrity ders plastid protein synthesis in Klebsormidium ultimately tra
c
amongZygnematophyceae(exceptingRoya)isconsistentwith dependent on the nucleus and raises questions concerning t/6
the high number of implied genome rearrangements identi- the mechanisms of inter-compartmental RNA trafficking. /4/8
fiedbyMGRanalysis(supplementaryfig.S4,Supplementary Additionally,ifthe4.5SrRNAisbeingimportedintothechlo- 97
/5
Material online). The syntenic structure of the Staurastrum roplast,thenitisalsoactingasaseparate4.5SrRNAspeciesas 43
chloroplast genome implies 20 and 23 rearrangements to in the embryophytes and is not an integral part of the 23S 54
9
matchthegeneorderinPellia(liverwort)andIsoetes(lycopod), rRNAasisimpliedbyitsannotationinnonembryophytestrep- b
y
respectively.Royaappearstohavetheleastnumberofrear- tophytechloroplastgenomes.ThetransportofnuclearmRNA gu
e
rangements among the known zygnematophycean chloro- intothechloroplastisknowntooccur(Nicola¨ıetal.2007),and s
plast genomes with a minimum of 18 and 21 changes indirectevidencesuggeststhattRNAsareimportedfromout- t on
1
implied by comparison to the Pellia and Isoetes genomes, side the plastid in some parasitic plants (Bungard 2004). 1
A
respectively. However, Roya and Staurastrum are also highly However, to date, the import of rRNA into the chloroplast p
rearrangedwithrespecttoeachother,with18impliedrear- has not been demonstrated. Although, the mechanism(s) of ril 2
0
rangements.Anevengreaternumberofrearrangementssep- chloroplast-directedRNAimportremainuncharacterized,two 19
aratethechloroplastgenomeofPelliafromMesotaeniumand candidate pathways are currently considered plausible. First,
Zygnema(atleast25and32changes,respectively).Bycom- theimportofrRNAintothechloroplastcouldbefacilitatedby
parison,thegeneorderofChaetosphaeridiumismoresimilar aproteinprecursorutilizingtheproteinimportpathway,asis
tolandplantsthanthoseofothercharophytesandrequiresas the case of tRNA transport into mitochondria (Schneider
few as 10 changes to match the operonic organization of 2011). Alternatively, short noncoding RNA sequences may
PelliaandIsoetes.Althoughtheabundanceofshortsequence be responsible for chloroplast localization of nuclear tran-
repeatshaspreviouslybeenimplicatedasapossiblemediator scripts(Go´mezandPalla´s2010).Ineithercase,thechloroplast
of genome arrangements, numbersof short repeats are not genomeofKlebsormidiumisunusualinlackingthe5SrRNA
exceptionally high in the two zygnematophycean genomes gene, 4.5S-homologous region, and six ribosomal protein
reported here. In Klebsormidium and Roya, short se- genes typically present in streptophyte chloroplast genomes
quencerepeatswererelativelyrare(0.24and0.38repeats/kb, anddisplaysauniquedependencyonthenucleusforchloro-
respectively) and similar to the numbers found in plastproteinsynthesis.
906 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
Description:1Centro de Ciкncias do Mar, Universidade do Algarve, Faro, Portugal. 2Department of .. ters, sequences were trimmed from both sides, discarding .. Table 1. Summary Statistics of the Genome Assembly Data. Platform. Number of. Reads Obtained. (Total). Mean Read. Length (After. Trimming) (bp).