Table Of ContentRESOURCE
Worm1:1,15–21;January/February/March2012;G2012LandesBioscience
WormBase
Annotating many nematode genomes
Kevin Howe,1,* Paul Davis,1 Michael Paulini,1 Mary Ann Tuli,1 Gary Williams,1 Karen Yook,2 Richard Durbin,3
Paul Kersey1 and Paul W. Sternberg2,*
1EuropeanBioinformaticsInstitute;WellcomeTrustGenomeCampus;Hinxton,CambridgeUK;2CaliforniaInstituteofTechnology;DivisionofBiology;Pasadena,CAUSA;
3WellcomeTrustSangerInstitute;WellcomeTrustGenomeCampus;Hinxton,CambridgeUK
Keywords:Caenorhabditiselegans,nematode,genome,annotation,modelorganismdatabase,communityresource,sequencecuration,
parasitic nematode
Abbreviations: ModENCODE, Model Organism Database ENCyclopedia Of DNA Elements; EST, Expressed Sequence Tag;
cDNA, complementary DNA; RNASeq, RNA sequencing by 2nd generation technologies; C., Caenorhabditis;
INSDC, International Nucleotide Sequence Database Collaboration
WormBase(www.wormbase.org)hasbeenservingthescientificcommunityforover11yearsasthecentralrepositoryfor
© gen2omic a0nd ge1netic i2nforma tioLn forathe snoil nedmatodeeCaensorhab diBtis elegianso. The sresoucrce hias eevolvend fromcitse.
beginningsasadatabasehousingthegenomicsequenceandgeneticandphysicalmapsofasinglespecies,andnow
represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for
around20nematodes.Inthisarticle,wefocusonWormBase’sroleofgenomesequenceannotation,describinghowwe
annotateandintegratedatafromagrowingcollectionofnematodespeciesandstrains.Wealsoreviewourapproaches
to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as
modENCODE.
Do not distribute.
Introduction onourremittoprovideintegrated,coherentgenomeannotationfor
alarge(andgrowing)collectionofnematodegenomesequencesand
WormBase seeks to present an integrative view of nematode strains.Wealsosummarizeourreleaseproductioncycleandanalysis
biology by in-depth curation of the research on C. elegans and pipelines, and describe how they affect the timeline between data
other members of this animal family. To this end we integrate submissionanditssubsequentpublicrelease.
genomic sequences and annotations with curated data from
genetic,developmental,physiological,behavioralandevolutionary Integrating and Annotating Multiple
studies. We provide multiple streams of access to the data, Nematode Genomes
including the main website portal (www.wormbase.org), genome
browsers, sequence search services, and application pro- WormBase now hosts genomic data for nearly 20 nematodes
gramminginterfaces.WormBaseaimstobethecentralrepository (seeTable1,andrefs.3–14),representingspeciesofevolutionary,
and portal for nematode genomic data. biomedical and agricultural interest. Recent additions include
The activities of the WormBase consortium can be broadly the parasitic nematodes Trichinella spiralis,3 Ascaris suum4 and
classified into three groups: (1) curation of C. elegans literature Bursaphelenchus xylophilus.5 The maturity of genome sequence
and associated research and development; (2) user interface and annotation in WormBase varies widely between species. At
design, development and maintenance and (3) genome sequence one end of the spectrum is the C. elegans genome, which was
annotation, analysis and comparative genomics. The volume of completed over a number of years using traditional physical
nematode data has exploded in recent years, and WormBase has mapping and clone-by-clone sequencing and finishing,6 and
had to respond accordingly in all three of these areas.1,2 For which has highly curated annotation. More recently we have
example, as the volume and variety of information has increased, seen a number of genome sequences generated by new high-
its presentation to the community in a clear and accessible way throughput low-cost technologies and many of these genomes
requires new forms of display. We have responded to this are inevitably fragmented and incomplete; additionally, there is
challengebycompletelyredesigningtheWormBaseweb-interface relatively little published functional information about many of
(Harrisetal.,manuscriptinpreparation).Inthisarticle,wefocus these species.
*Correspondenceto:PaulW.SternbergandKevinHowe;Email:[email protected]@wormbase.org
Submitted:11/02/11;Revised:02/02/12;Accepted:02/02/12
http://dx.doi.org/10.4161/worm.19574
www.landesbioscience.com Worm 15
0;
5
N
Genesetstatus Curated Curated Curated gExternal External Curated Curated External hExternal WormBasePredicted External External WormBasepredicted External WormBasePredicted External External hExternal External wiki/index.php/31.
CDSmodels(distinctloci) 25634(20517) 21961(21936) 31476(31471) 21332(18348) 24217(24216) 36105(29962) 30670(30667) 13072(13072) - 6201(6201) 26265(22622) 16380(16380) 45167(45167) 27721(22326) 8188(8077) 18449(18449) 18074(18074) - 46280(34696) nstitute.org/crd/availableinWS2
oadibe
WS230status op-levelScaffoldfagmentsN50 617493793 36717485439 3670461060 2721037841 180831244534 1881794149 3305368319 345284000 953883000 5970713338 335599453 68636373445 7636196652 66520921866 2184359029 29831407899 55271158000 1240312328 1526125228 ailable;(f)http://www.brC.angariaassemblywill
Tfr aved
ev
© 201b) 2 Landes Bioscienherproce.
M wm
Sequencedenomesize( 100.3 108.4 145.5 95.8 172.5 166.3 190.4 53.0 82.1 298.0 79.8 63.5 204.3 79.3 52.6 272.8 74.6 77.0 131.8 blyaccessionneset;(i)ani
g me
eg
AssemblyDeversion WS230oCAAC03000000 AAGD02000000 nAAQA01000000 v2(Sep.2010)oABLE03000000tABEG02000000 ABLG01000000dCABB01000000 iv1(Aug.2008)siAEHI01000000tABIR02000000rv1(June2011)ibAEKS01000000 CACX01000000uv1(Aug.2011)tCADV01000000eACKM01000000. v1(Jan.2012) ent;(e)INSDCassngsubmissionof
IntegratedintoWormBase WS1 WS132 WS185 WS185 WS194 WS195 WS196 WS204 WS205 WS208 WS218 WS225 WS226 WS226 WS226 WS229 WS229 WS229 WS230 edbytheenvironmWormBase;(h)awaiti
n 0 n
Referencestraisequenced BristolN2 AF16 PB4641 TRS PS312 DF5080 PB2801 VW9 Morelos MHco3(ISE) PS1010 ISS195 JU1422 JU1373 ED321Heterogonic Naturalisolate Ka4C1 M31e DRD-2008JU80 sexalsodetermipredictionsfrom
WormBase Modeofbreproduction androdioecious androdioecious gonochoristic gonochoristic hermaphoditic gonochoristic gonochoristic gonochoristic gonochoristic gonochoristic gonochoristic gonochoristic gonochoristic androdioecious gonochoristicandcparthenogenetic gonochoristic gonochoristic gonochoristicandc,dhermaphroditic gonochoristic (c)heterogonic;(d)additionalisoform
Table1.Nematodegenomesin aSpeciesBclade C.elegansV C.briggsaeV C.remaneiV BrugiamalayiIII PristionchusVpacificus C.japonicaV C.brenneriV MeloidogynehaplaIV MeloidogyneIVincognita HemonchusVcontortus C.angariaV TrichinellaspiralisI C.sp9V C.sp11V StrongyloidesrattiIV AscarissuumIII BursaphelenchusIVxylophilus HeterorhabditisVbacteriophora C.sp5V –Notes:(a)ref.15;(b)refs.1625;(g)authorgene-setextendedby
16 Worm Volume1Issue1
WormBase undertakes different responsibilities for each of Onewayinwhichweusetheorthologyrelationshipsinternally
thesespecies,whichcaninclude(1)administrationofthegenome istoprojectWormBase-approvedgenenames37ontoorthologous
sequence; (2) curation of gene models and other sequence gene(s) of other nematode species. For this a conservative
features; (3) curation of non-sequence-based data from the approach is adopted: each proposed gene name is required to be
literatureand(4)trackingofidentifiers forwardthroughdifferent supported by an unambiguous one to one orthology connection
versionsofthegenomesequenceandannotation.Thespecificway according to the majority of available source analyses.
in which we manage the data for a species depends (primarily) We also use Ensembl Compara DNA pipeline38 to produce
on whether we curate gene models and other features for it. It is whole-genome multiple alignments of all genomes in WormBase
therefore useful for the sake of discussion to classify the species and derived genome conservation tracks (using GERP39).
into two groups: core (WormBase curated gene models) and However, as the genetic diversity of the species collection in
non-core. As of release WS230, the core species are C. elegans, WormBase continues to increase, a single multiple alignment for
C. briggsae, C. remanei, C.brenneri and C. japonica. all nematodes becomes less appropriate. We therefore propose
Analyzing and presenting data for an ever-increasing number to replace it with a series of pairwise alignments, providing
ofnematodegenomesrequiresmethodsthatscalewell.Wedeploy multiple alignments only for selected subsets of species.
a standard automatic analysis pipeline to annotate all the species
wehouse(coreandnon-core),includingrepeatprediction,cDNA Sequence Curation
alignments, the determination of homology relationships, and
protein domain identification. If a genome sequence for a non- WormBase adopts an anomaly-driven approach to curation,
core species is submitted without a gene-set, we also run an whereby discrepancies between current gene models and align-
in-housegenepredictionpipelinethatusesCEGMA26toaccurately ment data are identified and flagged as curation targets. We have
© 2012 Landes Bioscience.
identify a small, universally conserved set of gene models. These implemented a software application (CurationTool) that identi-
are then used to train parameters for AUGUSTUS,27 which we fies these discrepancies and scores them according to their degree
then apply using protein homologies and any available RNASeq of discordance, presenting the results to the curator using a
and other transcript data as supporting evidence. In some cases, graphical user interface. An in-depth discussion of CurationTool
these internally-produced gene predictions are later replaced by a and our anomaly-driven curation is presented elsewhere.40
canonical set of models provided by the submitters. Forprotein-codinggenes,WormBasecuratesonlytheprotein-
Do not distribute.
Updating an existing species in WormBase with a new coding portion (CDS) of the full transcript. For our core species,
assembly and/or gene-set presents additional challenges, because we use the high-confidence subset of cDNA alignments over-
users rely on stable identifiers to track their entities of interest, laying the curated CDS models to infer a set of full-length
which must be propagated forward to corresponding features transcripts (including 5' and 3' untranslated regions), using a
in subsequent releases. For core species, identifiers are actively custom algorithm (unpublished).In thepast, theaccuracy ofthis
managedandtrackedusingourowncurationsoftwareinfrastruc- process has been sensitive to artifacts such as alignment errors or
ture. For non-core species, we use the Ensembl28 stable-identifier chimericcDNAs,butwehaverecentlyimprovedthealgorithmto
mapping software for this task. take these factors into account.
Theprincipalwayinwhichwedrawinformationfrommultiple The primary line of evidence for gene model curation is
speciestogetherisbyconnectinggenesviaorthologyandparalogy transcriptdata.InadditiontocDNAsdepositedinthenucleotide
relationships to genes in other species (both nematode and other archives, we draw data from numerous resources, publications
model organisms such as human, mouse and fly). As of WS230, and directsubmissions. We also align all RNASeq data deposited
we include relationships published by the following projects and in the Short Read Archive (SRA) to our core species using
resources: InParanoid29 (version 7); TreeFam30 (version 7); the TopHat,41and infergeneexpression estimatesforavarietyoflife
Othologous Matrix Project31 (OMA, August 2009/08 version); stages and environmental conditions using Cufflinks.42
OrthoMCL;32 PantherDB33,34 (version 7); and Ensembl28,35 WormBase is committed to act as the ultimate repository for
(version 65). In addition, we curate orthology calls from the data coming from the nematode half of the modENCODE43,44
literature (e.g., Hillier et al., ref. 8) and direct submissions. We project. Most data sets have been accessible via the genome
also use data in eggNOG36 (version 3.0) to cluster genes into browser since the summer of 2010. To extract the maximum
functionally characterized homologous groups. utility from the data, it is integrated fully into our database, by
These resources are inevitably based on snapshots of the gene extending the data models where necessary and adding full cross-
models, taken at various times. For our core species however, referencing and connectivity with existing WormBase objects.
particularlyC.elegans,thegenemodelsareinastateofflux,being To date, the focus for full integration has been on data sets with
revised andimprovedonthebasis ofthelatestevidence.Inorder high impact on gene model and other sequence feature curation,
to infer up-to-date nematode homology relationships for the namely:trans-splice sites;45poly-Acleavagesitesand untranslated
latest gene models, we run the Ensembl Compara GeneTree regions;44,46 large-scale EST sets (P. Green; data retrieved from
pipeline35 as part of the preparation for every WormBase release. nucleotide archives); mass-spectrometry peptide sequences;44 and
The resulting gene trees are used to infer additional current RNASeq transcripts, and derived gene-predictions.44
orthology relationships to those obtained by import from the The data of highest impact for curation has been the RNASeq
third-party resources and direct submission. transcriptome, and this has been used in a number of different
www.landesbioscience.com Worm 17
ways. First, the modENCODE “genelets” (fragmentary gene mass-spectrometry evidences is 83%, 88% and 14% respectively.
models constructed using RNASeq data from 14 life stages) have Overall, 93% of curated introns are confirmed and 82% of
been used to produce a new anomaly type for CurationTool that CDS models have all of their introns confirmed by at least one
highlights potential cases where adjacent genes could be merged. of these three lines of evidence; the corresponding measurements
To date, over three hundred cases displaying this anomaly have for the final release prior to modENCODE (WS200, February
been scrutinized, of which approximately 35% resulted in a 2009) were 74% and 56%, demonstrating the value of the pro-
merge, and a further 10% some other change (for example the ject in increasing the accuracy and confidence of C. elegans gene
movement of an exon from one gene to another). Second, we models.
have re-visited the source RNASeq data and analyzed it using
theTophat/Cufflnkspipeline41,42toidentifycandidate“RNASeq- Intraspecies Variation
splice” features. These can be used both to confirm introns
already part of curated gene models, and also to suggest changes Similar to many other resources, WormBase captures within-
to existing gene models or new isoforms. Third, the strand bias species variation as differences (insertions, deletions and substitu-
characteristic of the modENCODE RNASeq alignments47 has tions) with respect to the genome sequence of the reference
been extremely useful for curators to resolve ambiguities in strain. We expect variation data for many nematode species in
the definition of the 5' and 3' ends of genes. Finally, the the future, but at present almost all the data we house is for
modENCODERNASeqdatahasallowedustomakecorrections C. elegans.
to the C. elegans reference genome itself. By taking proposed Historically, the majority of variation data we have processed
errors and verifying them using data from a private submission has beenfrom laboratory-manipulated strains. We maintainclose
of high-throughput-sequencing (J. Ahringer and M. Berriman, working relationships and established data exchange protocols
© 2012 Landes Bioscience.
pers. comm.), we have been able to make 156 genome sequence with the Caenorhabditis Genetics Center (CGC; www.cbs.umn.
corrections (110 insertions, 44 deletions and 2 substitutions), edu/CGC), the C. elegans Gene Knockout Consortium
resulting in the correction of 100 gene models. (GKC; www.celeganskoconsortium.omrf.org), and the National
Additionally, since the data from modENCODE began to BioResource Project of Japan (NBRP; www.shigen.nig.ac.jp/c.
become available from the project Data Co-ordination Centre, elegans/index.jsp). We also curate variation data from individual
the following data sets have been subjected to rigorous internal user submissions; which although time-consuming, are often
Do not distribute.
quality control and fully integrated into the database: ~300 biologically important.
Highly Occupied Target (HOT) regions;44 ~7,000 non-coding There has recently been a rapid growth of C. elegans variation
RNAgenes;44theprobable parent for~1,000pseudogenes;44and data generated by whole genome sequencing projects (refs. 50–
~21,000three-primeUTRsfromtheUTRomeproject.46Wewill 54; Andersen et al., manuscript in preparation; Moerman and
prioritise the incorporation of the transcription-factor binding Waterston, manuscript in preparation). These data sets include
site and chromatin accessibility data as soon as the final versions an increasing number of variations from naturally-occurring
of these data sets are made available. wild-isolate strains. Motivated by community feedback, we have
We have also worked with groups performing their own increased the clarity of our representation and display of this
analysis of the modENCODE data. For example, a study of the information. Every variation object processed by WormBase is
modENCODERNASeqreads(T.Blumenthal,pers.comm.)has assigned a unique, stable identifier with prefix “WBVar.” For
resulted in significant improvements to the operon data set. This laboratory-induced variations, we also assign a more directly
has involved identifying cases where fewer than 5% of the trans- informative public name comprised of a project/laboratory prefix
splice leader reads for “internal” genes (i.e., genes other than the (supplied by J. Hodgkin, pers. comm.) and a numerical suffix.
first) were SL2 type, and modifying the gene content of the For naturally occurring variations, the public name defaults to
operons accordingly. the WBVar identifier, making the distinction between these
In addition to modENCODE, we continue to draw in data objects and the laboratory induced variations obvious and
from the scientific literature and direct submissions, often com- immediate.
bining different data sources to assist in making correct We now also collect non-sequence-based information for wild
predictions. The modENCODE poly-A site data has been isolate strains (http://tazendra.caltech.edu/~azurebrd/cgi-bin/
supplementedwithacorrespondingdatasetfromanindependent forms/wild_isolate.cgi). Compared with laboratory-manipulated
study.48Thesetwodatasetshaveonly25%redundancy,andover strains, there is additional information to capture about the wild
80% of coding genes now have an annotated polyA site in isolates, such as isolation location, the condition in which it was
WormBase. Gene predictions by genBlastG49 based on BLAST found, and details of how it was isolated. Many wild isolates are
homologies to C. elegans proteins have also proved valuable for not stocked at the CGC, and WormBase acts as the central data
the curation of C. briggsae, C. brenneri, and C. remenei. repository for these strains.
We can assess gene-model accuracy in the presence of WormBase does not have a mandate to act as a permanent
fragmentary transcript evidence by measuring the proportion of repository for variation data, and as the volume of these data sets
curated introns that are confirmed by spliced cDNA evidence. continuestorapidlyincrease,webecomelessadequatelyresourced
For WS230, the proportion of C. elegans curated CDS introns to perform this function. Projects are therefore encouraged to
confirmed by traditional cDNA, modENCODE RNASeq and submit their data to the NCBI’s Database of Short Genetic
18 Worm Volume1Issue1
Variations (dbSNP),55 an established archive for variation data. homology detection and whole-genome alignment; and (7)
We act as a submission broker in cases where a laboratory lacks quality control and assurance.
the technical resources to conform to the dbSNP submission Forthemorecomplicatedpartsofthebuildprocess,wedeploy
protocols. To date, data from six projects have been integrated twocomponentsoftheEnsemblsystemforthemanagementand
intoWormBase andsubmittedtodbSNP. WormBase adds value tracking of computational pipelines: ensembl-pipeline57 for
to these data sets by performing additional analysis and placing homology analysis and eHive58 for comparative analysis. The
them into context with other data types (e.g., Gene). keyfeaturesofthesesystemsare(1)automaticre-runoftasksthat
Variations are most often submitted to WormBase as a have failed; and (2) user-definition of a sub-task dependency
molecular change at given location in a specific version of the graph for a process, allowing complex pipelines to be run with
reference genome sequence. As part of the curation, we capture minimal user intervention. These systems are critical in enabling
and record a short flanking sequence either side of the variation us to produce the database in a regular and timely manner.
feature, disassociating it from a specific version of the reference Each stage of the database production is subject to a suite of
genome. Each release, we re-map all variations and re-calculate integrity checks to ensure that it has completed cleanly and
potential consequences ofthemolecular changes(e.g., non-sense, withouterror.Forexample,wecomparethenumberofobjectsin
mis-sense or silent protein-coding mutation) on the latest gene each data class with the count at the corresponding stage in the
models. previousrelease.Majordiscrepanciesareflaggedforinvestigation.
This mechanism has proved to be extremely effective in catching
Release Cycle and Database Build errors and process failures as soon as they occur.
WormBase is released every two months, with the preparation Summary
© 2012 Landes Bioscience.
for a release beginning three months in advance. This release
cycle can give rise to variability in the time between a curator WormBase is facing a deluge of data from many nematode
transaction (e.g., the update of a gene name, correction of an genome sequencing projects, and we have prepared for this by
error, or the import of a new data set) and its availability on the putting into place annotation and integration pipelines and
WormBase website. The delay can be as short as three months workflowsthatwillallowthedatatobeanalyzedandpresentedin
(if the change is made immediately before we start building the a timely and consistent manner. As ever, we welcome feedback
Do not distribute.
release) and as long as five months (if made immediately after, in and ideas from our user-base as part of the continued develop-
which case it will not be public until the following release). ment of the resource. We are currently particularly interested in
BuildingaWormBasedatabasereleaseisacomplicatedprocess, suggestions on how we can maximise the utility of housing a
the broad stages of which can be described as: (1) data freeze, broad representation of the nematode phylum, and what
where each contributing consortium partner takes a snap-shot comparative genomics services and views users would find most
of the database(s) in which their curation data are stored; (2) useful. Users can contact the developers at [email protected]
data collation, where the curation database snap-shots are with their suggestions.
brought together into a single database; (3) submission of
updated annotation on core species to the International Nucleo- Acknowledgments
tide Sequence Database Collaboration,56 to ensure that the This work is supported by the US National Institutes of Health
representation of core nematode data in the nucleotide and (Grant no. P41 HG02223); US National Human Genome
proteinarchivesisup-to-date;(4)mappingofsequencedata(e.g., Research Institute (Grant no. P41-HG02223); and British
cDNAs, microarray probes, sequence features, variations) to the Medical Research Council (Grant no. G070119); P.W.S. is an
genome;(5)establishingconnectionsbetweenobjectsofdifferent investigatorwiththeHowardHughesMedicalInstitute.Funding
types (e.g., RNAi to Gene), usually via genomic location; (6) the for open access charge: US National Human Genome Research
large-scale computational analyses discussed earlier, such as Institute (Grant no. P41-HG02223).
References 4. JexAR,LiuS,LiB,YoungND,HallRS,LiY,etal.Ascaris 7. SteinLD,BaoZ,BlasiarD,BlumenthalT,BrentMR,
suumdraftgenome.Nature2011;479:529-33;PMID: ChenN,etal.ThegenomesequenceofCaenorhabditis
1. Yook K, Harris TW, Bieri T, Cabunoc A, Chan J,
22031327;http://dx.doi.org/10.1038/nature10553 briggsae: a platform for comparative genomics. PLoS
Chen WJ, et al. WormBase 2012: more genomes,
more data, new website. Nucleic Acids Res 2012; 5. Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Biol2003;1:E45;PMID:14624247;http://dx.doi.org/
40(Databaseissue):D735-41;PMID:22067452;http:// KanzakiN,McVeighP,etal.Genomicinsightsinto 10.1371/journal.pbio.0000045
dx.doi.org/10.1093/nar/gkr954 theoriginofparasitismintheemergingplantpathogen 8. HillierLW,MillerRD,BairdSE,ChinwallaA,Fulton
Bursaphelenchus xylophilus. PLoS Pathog 2011; 7: LA,KoboldtDC,etal.ComparisonofC.elegansand
2. HarrisTW,AntoshechkinI,BieriT,BlasiarD,ChanJ,
e1002219;PMID:21909270;http://dx.doi.org/10.1371/ C. briggsae genome sequences reveals extensive con-
ChenWJ,etal.WormBase:acomprehensiveresource
journal.ppat.1002219 servation of chromosome organization and synteny.
for nematode research. Nucleic Acids Res 2010;
38(Database issue):D463-7; PMID:19910365; http:// 6. C.elegansSequencingConsortium.Genomesequence PLoSBiol2007;5:e167;PMID:17608563;http://dx.
dx.doi.org/10.1093/nar/gkp952 ofthenematodeC.elegans:aplatformforinvestigating doi.org/10.1371/journal.pbio.0050167
biology. Science 1998; 282:2012-8; PMID:9851916;
3. Mitreva M, Jasmer DP, Zarlenga DS, Wang Z,
http://dx.doi.org/10.1126/science.282.5396.2012
Abubucker S, Martin J, et al. The draft genome of
the parasitic nematode Trichinellaspiralis.Nat Genet
2011;43:228-35;PMID:21336279;http://dx.doi.org/
10.1038/ng.769
www.landesbioscience.com Worm 19
9. Ross JA, Koboldt DC, Staisch JE, Chamberlin HM, 24. BoagPR,NewtonSE,GasserRB.Molecularaspectsof 39. Cooper GM, Stone EA, Asimenos G, Green ED,
Gupta BP, Miller RD, et al. Caenorhabditis briggsae sexualdevelopmentandreproductioninnematodesand Batzoglou S, Sidow A, et al.; NISC Comparative
recombinant inbred line genotypes reveal inter-strain schistosomes.AdvParasitol2001;50:153-98;PMID: Sequencing Program. Distribution and intensity of
incompatibility and the evolution of recombination. 11757331;http://dx.doi.org/10.1016/S0065-308X(01) constraintinmammaliangenomicsequence.Genome
PLoS Genet 2011; 7:e1002174; PMID:21779179; 50031-7 Res2005;15:901-13;PMID:15965027;http://dx.doi.
http://dx.doi.org/10.1371/journal.pgen.1002174 25. HasegawaK,MotaMM,FutaiK,MiwaJ.Chromosome org/10.1101/gr.3577405
10. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, structure and behaviour in Bursaphelenchus xylophilus 40. WilliamsGW,DavisPA,RogersAS,BieriT,Ozersky
CrabtreeJ,etal.Draftgenomeofthefilarialnematode (Nematoda: Parasitaphelenchidae) germ cells andearly P,SpiethJ.Methodsandstrategiesforgenestructure
parasite Brugia malayi. Science 2007; 317:1756-60; embryo.Nematology2006;8:425-34;http://dx.doi.org/ curationinWormBase.Database(Oxford)2011;2011:
PMID:17885136; http://dx.doi.org/10.1126/science. 10.1163/156854106778493475 baq039;PMID:21543339; http://dx.doi.org/10.1093/
1145406 26. ParraG,BradnamK,KorfI.CEGMA:apipelineto database/baq039
11. DieterichC,CliftonSW,SchusterLN,ChinwallaA, accuratelyannotatecoregenesineukaryoticgenomes. 41. TrapnellC,PachterL,SalzbergSL.TopHat:discover-
Delehaunty K, Dinkelacker I, et al. The Pristionchus Bioinformatics 2007; 23:1061-7; PMID:17332020; ing splice junctions with RNA-Seq. Bioinformatics
pacificus genome provides a unique perspective on http://dx.doi.org/10.1093/bioinformatics/btm071 2009; 25:1105-11; PMID:19289445; http://dx.doi.
nematodelifestyleandparasitism.NatGenet2008;40: 27. StankeM,SchöffmannO,MorgensternB,WaackS. org/10.1093/bioinformatics/btp120
1193-8; PMID:18806794; http://dx.doi.org/10.1038/ Gene prediction in eukaryotes with a generalized 42. Trapnell C, Williams BA, Pertea G, Mortazavi A,
ng.227 hidden Markov model that uses hints from external Kwan G, van Baren MJ, et al. Transcript assembly
12. Abad P, Gouzy J, Aury JM, Castagnone-Sereno P, sources. BMC Bioinformatics 2006; 7:62; PMID: and quantification by RNA-Seq reveals unannotated
Danchin EG, Deleury E,et al. Genome sequence of 16469098;http://dx.doi.org/10.1186/1471-2105-7-62 transcripts and isoform switching during cell differ-
the metazoan plant-parasitic nematode Meloidogyne 28. FlicekP,AmodeMR,BarrellD,BealK,BrentS,Chen entiation. Nat Biotechnol 2010; 28:511-5; PMID:
incognita. Nat Biotechnol 2008; 26:909-15; PMID: Y,etal.Ensembl2011.NucleicAcidsRes2011;39 20436464;http://dx.doi.org/10.1038/nbt.1621
18660804;http://dx.doi.org/10.1038/nbt.1482 (Databaseissue):D800-6;PMID:21045057;http://dx. 43. CelnikerSE,DillonLA,GersteinMB,GunsalusKC,
13. OppermanCH,BirdDM,WilliamsonVM,Rokhsar doi.org/10.1093/nar/gkq1064 Henikoff S, Karpen GH, et al.; modENCODE
DS,BurkeM,CohnJ,etal.Sequenceandgeneticmap 29. OstlundG,SchmittT,ForslundK,KöstlerT,Messina Consortium. Unlocking the secrets of the genome.
ofMeloidogynehapla:Acompactnematodegenomefor DN,RoopraS,etal.InParanoid7:newalgorithmsand Nature2009;459:927-30;PMID:19536255
plantparasitism.ProcNatlAcadSciUSA2008;105: tools foreukaryotic orthology analysis. Nucleic Acids 44. Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C,
14802-7; PMID:18809916; http://dx.doi.org/10.1073/ Res 2010; 38(Database issue):D196-203; PMID: ArshinoffBI,LiuT,etal.;modENCODEConsortium.
©pnas.08 059246105 012 La198n92828;hdttp://dx.edoi.org/1s0.1093 /narB/gkp931 iosIntegratciveanaliysisoeftheCanenorhabdcitiselegaensgenom.e
14. Mortazavi A, Schwarz EM, Williams B, Schaeffer L, 30. Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, by the modENCODE project. Science 2010; 330:
Antoshechkin I, Wold BJ, et al. Scaffolding a Osmotherly L, et al. TreeFam: a curateddatabase of 1775-87; PMID:21177976; http://dx.doi.org/10.1126/
Caenorhabditis nematode genome with RNA-seq. phylogenetic trees of animal gene families. Nucleic science.1196914
Genome Res 2010; 20:1740-7; PMID:20980554; AcidsRes2006;34(Databaseissue):D572-80;PMID: 45. AllenMA,HillierLW,WaterstonRH,BlumenthalT.
http://dx.doi.org/10.1101/gr.111021.110 16381935;http://dx.doi.org/10.1093/nar/gkj118 AglobalanalysisofC.eleganstrans-splicing.Genome
15. BlaxterML,DeLeyP,GareyJR,LiuLX,Scheldeman 31. AltenhoffAM,SchneiderA,GonnetGH,DessimozC. Res2011;21:255-64;PMID:21177958;http://dx.doi.
P, Vierstraete A, et al. A molecular evolutionary OMA2011:orthologyinferenceamong1000complete org/10.1101/gr.113811.110
framework for the phylum Nematoda. Nature 1998; genomes.NucleicAcidsRes2011;39(Databaseissue): 46. MangoneM,ManoharanAP,Thierry-MiegD,Thierry-
Do not distribute.
392:71-5;PMID:9510248;http://dx.doi.org/10.1038/ D289-94; PMID:21113020; http://dx.doi.org/10. MiegJ,HanT,MackowiakSD,etal.Thelandscapeof
32160 1093/nar/gkq1238 C.elegans3’UTRs.Science2010;329:432-5;PMID:
16. HaagS.Theevolutionofnematodesexdetermination: 32. Chen F, Mackey AJ, Stoeckert CJ, Jr., Roos DS. 20522740;http://dx.doi.org/10.1126/science.1191244
C.elegansasareferencepointforcomparativebiology OrthoMCL-DB:queryingacomprehensivemulti-species 47. HillierLW,ReinkeV,GreenP,HirstM,MarraMA,
(December 29 2005). In: The C. elegans Research collectionoforthologgroups.NucleicAcidsRes2006;34 Waterston RH. Massively parallel sequencing of the
Communityed.WormBook.http://www.wormbook.org (Database issue):D363-8; PMID:16381887; http://dx. polyadenylated transcriptome of C. elegans. Genome
17. Kiontke KC, Félix MA, Ailion M, Rockman MV, doi.org/10.1093/nar/gkj123 Res2009;19:657-66;PMID:19181841;http://dx.doi.
Braendle C, Pénigault JB, et al. A phylogeny and 33. Thomas PD, Kejariwal A, Campbell MJ, Mi H, org/10.1101/gr.088112.108
molecularbarcodesforCaenorhabditis,withnumerous Diemer K, Guo N, et al. PANTHER: a browsable 48. JanCH,FriedmanRC,RubyJG,BartelDP.Formation,
newspeciesfromrottingfruits.BMCEvolBiol2011; database of gene products organized by biological regulation and evolution of Caenorhabditis elegans
11:339; PMID:22103856; http://dx.doi.org/10.1186/ function, using curated protein family and subfamily 3’UTRs.Nature2011;469:97-101;PMID:21085120;
1471-2148-11-339 classification. Nucleic Acids Res 2003; 31:334-41; http://dx.doi.org/10.1038/nature09616
18. MayerWE,HerrmannM,SommerRJ.Phylogenyof PMID:12520017;http://dx.doi.org/10.1093/nar/gkg115 49. SheR,ChuJS,UyarB,WangJ,WangK,ChenN.
thenematodegenusPristionchusandimplicationsfor 34. MiH,DongQ,MuruganujanA,GaudetP,LewisS, genBlastG:usingBLASTsearchestobuildhomologous
biodiversity,biogeographyandtheevolutionofherma- ThomasPD.PANTHERversion7:improvedphylo- genemodels.Bioinformatics2011;27:2141-3;PMID:
phroditism. BMC Evol Biol 2007; 7:104; PMID: genetictrees,orthologsandcollaborationwiththeGene 21653517; http://dx.doi.org/10.1093/bioinformatics/
17605767;http://dx.doi.org/10.1186/1471-2148-7-104 Ontology Consortium. Nucleic Acids Res 2010; 38 btr342
19. Redman E, Grillo V, Saunders G, Packard E, (Database issue):D204-10; PMID:20015972; http:// 50. ZurynS,LeGrasS,JametK,JarriaultS.Astrategyfor
Jackson F, Berriman M, et al. Genetics of mating dx.doi.org/10.1093/nar/gkp1019 direct mapping and identification of mutations by
and sex determination in the parasitic nematode 35. VilellaAJ,SeverinJ,Ureta-VidalA,HengL,DurbinR, whole-genomesequencing.Genetics2010;186:427-30;
Haemonchus contortus. Genetics 2008; 180:1877-87; Birney E. EnsemblCompara GeneTrees: Complete, PMID:20610404; http://dx.doi.org/10.1534/genetics.
PMID:18854587; http://dx.doi.org/10.1534/genetics. duplication-aware phylogenetic trees in vertebrates. 110.119230
108.094623 Genome Res 2009; 19:327-35; PMID:19029536; 51. Sarin S, Bertrand V, Bigelow H, Boyanov A,
20. Bird DM, Williamson VM, Abad P, McCarter J, http://dx.doi.org/10.1101/gr.073585.107 Doitsidou M, Poole RJ, et al. Analysis of multiple
DanchinEG,Castagnone-SerenoP,etal.Thegenomes 36. MullerJ,SzklarczykD,JulienP,LetunicI,RothA,Kuhn ethyl methanesulfonate-mutagenized Caenorhabditis
ofroot-knotnematodes.AnnuRevPhytopathol2009; M, et al. eggNOG v2.0: extending the evolutionary elegansstrainsbywhole-genomesequencing.Genetics
47:333-51; PMID:19400640; http://dx.doi.org/10. genealogy of genes with enhanced non-supervised 2010; 185:417-30; PMID:20439776; http://dx.doi.
1146/annurev-phyto-080508-081839 orthologousgroups,speciesandfunctionalannotations. org/10.1534/genetics.110.116319
21. CicheT.ThebiologyandgenomeofHeterorhabditis Nucleic Acids Res 2010; 38(Database issue):D190-5; 52. FlibotteS,EdgleyML,ChaudhryI,TaylorJ,NeilSE,
bacteriophora (February 20 2007). In: The C. elegans PMID:19900971;http://dx.doi.org/10.1093/nar/gkp951 Rogula A, et al. Whole-genome profiling of muta-
Research Community ed. WormBook. http://www. 37. Horvitz HR, Brenner S, Hodgkin J, Herman RK. A genesisinCaenorhabditiselegans.Genetics2010;185:
wormbook.org uniform genetic nomenclature for the nematode 431-41; PMID:20439774; http://dx.doi.org/10.1534/
22. Viney ME. A genetic analysis of reproduction Caenorhabditis elegans. Mol Gen Genet 1979; 175: genetics.110.116616
in Strongyloides ratti. Parasitology 1994; 109: 129-33; PMID:292825; http://dx.doi.org/10.1007/ 53. SarinS,PrabhuS,O’MearaMM,Pe’erI,HobertO.
511-5; PMID:7800419; http://dx.doi.org/10.1017/ BF00425528 Caenorhabditis elegans mutant allele identification by
S0031182000080768 38. Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. whole-genome sequencing. Nat Methods 2008; 5:
23. Pires-daSilva A. Evolution of the control of sexual EnredoandPecan:genome-widemammalianconsistency- 865-7; PMID:18677319; http://dx.doi.org/10.1038/
identity in nematodes. Semin Cell Dev Biol 2007; based multiple alignment with paralogs. Genome Res nmeth.1249
18:362-70; PMID:17306573; http://dx.doi.org/10. 2008; 18:1814-28; PMID:18849524; http://dx.doi.org/
1016/j.semcdb.2006.11.014 10.1101/gr.076554.108
20 Worm Volume1Issue1
54. Hillier LW, Marth GT, Quinlan AR, Dooling D, 56. Karsch-Mizrachi I, Nakamura Y, Cochrane G; Inter- 58. SeverinJ,BealK,VilellaAJ,FitzgeraldS,SchusterM,
FewellG,BarnettD,etal.Whole-genomesequencing nationalNucleotideSequenceDatabaseCollaboration. GordonL,etal.eHive:anartificialintelligenceworkflow
andvariantdiscoveryinC.elegans.NatMethods2008; The International Nucleotide Sequence Database systemforgenomicanalysis.BMCBioinformatics2010;
5:183-8;PMID:18204455;http://dx.doi.org/10.1038/ Collaboration. Nucleic Acids Res 2012; 40(Database 11:240; PMID:20459813; http://dx.doi.org/10.1186/
nmeth.1179 issue):D33-7; PMID:22080546; http://dx.doi.org/10. 1471-2105-11-240
55. SherryST,WardMH,KholodovM,BakerJ,PhanL, 1093/nar/gkr1006
Smigielski EM, et al. dbSNP: the NCBI database of 57. PotterSC,ClarkeL,CurwenV,KeenanS,MonginE,
geneticvariation.NucleicAcidsRes2001;29:308-11; Searle SM, et al. The Ensembl analysis pipeline.
PMID:11125122; http://dx.doi.org/10.1093/nar/29.1. Genome Res 2004; 14:934-41; PMID:15123589;
308 http://dx.doi.org/10.1101/gr.1859804
© 2012 Landes Bioscience.
Do not distribute.
www.landesbioscience.com Worm 21