Table Of Content

Kumaretal.BMCBioinformatics2012,13:6 http://www.biomedcentral.com/1471-2105/13/6 DATABASE Open Access MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases Akhil Kumar1, Patrick F Suthers2 and Costas D Maranas2* Abstract Background: Increasingly, metabolite and reaction information is organized in the form of genome-scale metabolic reconstructions that describe the reaction stoichiometry, directionality, and gene to protein to reaction associations. A key bottleneck in the pace of reconstruction of new, high-quality metabolic models is the inability to directly make use of metabolite/reaction information from biological databases or other models due to incompatibilities in content representation (i.e., metabolites with multiple names across databases and models), stoichiometric errors such as elemental or charge imbalances, and incomplete atomistic detail (e.g., use of generic R-group or non-explicit specification of stereo-specificity). Description: MetRxn is a knowledgebase that includes standardized metabolite and reaction descriptions by integrating information from BRENDA, KEGG, MetaCyc, Reactome.org and 44 metabolic models into a single unified data set. All metabolite entries have matched synonyms, resolved protonation states, and are linked to unique structures. All reaction entries are elementally and charge balanced. This is accomplished through the use of a workflow of lexicographic, phonetic, and structural comparison algorithms. MetRxn allows for the download of standardized versions of existing genome-scale metabolic models and the use of metabolic information for the rapid reconstruction of new ones. Conclusions: The standardization in description allows for the direct comparison of the metabolite and reaction content between metabolic models and databases and the exhaustive prospecting of pathways for biotechnological production. This ever-growing dataset currently consists of over 76,000 metabolites participating in more than 72,000 reactions (including unresolved entries). MetRxn is hosted on a web-based platform that uses relational database models (MySQL). Background and biomass composition. Already over 75 genome-scale The ever accelerating pace of DNA sequencing and models are in place for eukaryotic, prokaryotic and annotation information generation [1] is spearheading archaeal species [3] and are becoming indispensable for the global inventorying of metabolic functions across all computationally driving engineering interventions in kingdoms of life. Increasingly, metabolite and reaction microbial strains for targeted overproductions [4-7], elu- information is organized in the form of community [2], cidating the organizing principles of metabolism [8-11] organism, or even tissue-specific genome-scale meta- and even pinpointing drug targets [12,13]. A key bottle- bolic reconstructions. These reconstructions account for neck in the pace of reconstruction of new high quality reaction stoichiometry and directionality, gene to pro- metabolic models is our inability to directly make use of tein to reaction associations, organelle reaction localiza- metabolite/reaction information from biological data- tion, transporter information, transcriptional regulation bases [14] (e.g., BRENDA [15], KEGG [16], MetaCyc, EcoCyc, BioCyc [17], BKM-react [18], UM-BBD [19], Reactome.org, Rhea, PubChem, ChEBI etc.) or other *Correspondence:[email protected] 2DepartmentofChemicalEngineering,ThePennsylvaniaStateUniversity, models [20] due to incompatibilities of representation, UniversityPark,PA16802,USA duplications and errors, as illustrated in Figure 1. Fulllistofauthorinformationisavailableattheendofthearticle ©2012Kumaretal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited. Kumaretal.BMCBioinformatics2012,13:6 Page2of13 http://www.biomedcentral.com/1471-2105/13/6 1) Naming Inconsistencies 2-Oxoglutarate + L-Alanine <=> Pyruvate + L-Glutamate KEGG C00026 + C00041<=>C00022+C00025 BRENDA alpha-ketoglutarate + L-alanine<=>L-glutamate + pyruvate Escherichia coli iAF1260 [c] : akg + ala-L --> glu-L + pyr Acinetobacter baylyi iAbaylyi 1 GLT + 1 PYRUVATE <-> 1 2-KETOGLUTARATE + 1 L-ALPHA-ALANINE Leishmania major iAC560 [m] : akg + ala-L -> glu-L + pyr Mannheimia succiniciproducens PYR + GLU --> AKG + ALA 2) Elemental and charge imbalances Balanced KEGG (R)-Lactate + NAD+<=>Pyruvate + NADH + H+ Escherichia coliiAF1260 [c] : lac-D + nad --> h + nadh + pyr Unbalanced Acinetobacter baylyi iAbaylyi 1 D-LACTATE + 1 NAD <=>1 NADH + 1 PYRUVATE 3) Errors/incompleteness/ambiguity in structural information models s R e olit b a et databases m # # structures Figure1Typicalincompatibilitiesandinconsistenciesingenome-scalemodelsanddatabases.Roadblockstousinggenome-scalemodels anddatabasesincludeambiguitiesanddifferencesinnamingconventions,lackofbalancedreactions,andincompletenessofstructural information. A major impediment is the presence of metabolites and may contain generic side groups (e.g., alkyl groups with multiple names across databases and models, and -R), varying degree of a repeat unit participation in oli- in some cases within the same resource, which signifi- gomers, or even just compound class identification such cantly slows down the pooling of information from mul- as “an amino acid” or “electron acceptor”. Over 3% of tiple sources. Therefore, the almost unavoidable all metabolites and 8% of all reactions in the aforemen- inclusion of multiple replicates of the same metabolite tioned databases and models exhibit one or more of can lead to missed opportunities to reveal (synthetic) these problems. lethal gene deletions, repair network gaps and quantify There have already been a number of efforts aimed at metabolic flows. Moreover, most data sources inadver- addressing some of these limitations. The Rhea database, tently include some reactions that may be stoichiometri- hosted by the European Bioinformatics Institute, aggre- cally inconsistent [21] and/or elementally/charge gates reaction data primarily from IntEnz [24] and unbalanced [22,23], which can adversely affect the pre- ENZYME [25], whereas Reactome.org is a collection of diction quality of the resulting models if used directly. reactions primarily focused on human metabolism Finally, a large number of metabolites in reactions are [26,27]. Even though they crosslink their data to one or partly specified with respect to structural information more popular databases such as KEGG, ChEBI, NCBI, Kumaretal.BMCBioinformatics2012,13:6 Page3of13 http://www.biomedcentral.com/1471-2105/13/6 Ensembl, Uniprot, etc., both retain their own representa- integration of metabolite and reaction data, 3) calculation formats. More recently, the BKM-react database is tion and reconciliation of structural information, 4) a non-redundant biochemical reaction database contain- identification of overlaps between metabolite and reac- ing known enzyme-catalyzed reactions compiled from tion information, 5) elemental and charge balancing of BRENDA, KEGG, and MetaCyc [18]. The BKM-react reactions, 6) successive resolution of remaining ambigu- database currently contains 20,358 reactions. Addition- ities in description. ally, the contents of five frequently used human meta- Step 1: Source data acquisition bolic pathway databases have been compared [28]. An Metabolite and reaction data was downloaded from important step forward for models was the BiGG data- BRENDA, KEGG, BioCyc, BKM-react and other database, which includes seven genome-scale models from bases using a variety of methods based on protocols the Palsson group in a consistent nomenclature and such as SOAP, FTP and HTTP. We preprocessed the exportable in SBML format [29-31]. Research towards data into flat files that were subsequently imported into integrating genome-scale metabolic models with large the knowledgebase. All original information pertaining databases has so far been even more limited. Notable to metabolite name, abbreviations, metabolite geometry, exceptions include the partial reconciliation of the latest related reactions, catalyzing enzyme and organism E. coli genome scale model iAF1260 with EcoCyc [32] name, gene-protein-reaction associations, and compart- and the aggregation of data from the Arabidopsis thali- mentalization was retained. For all 44 initial genome- ana database and KEGG for generating genome-scale scale models listed, the online information from the cor- models [33] in a semi-automated fashion. Additionally, responding publications was also imported. The source ReMatch integrates some metabolic models, although its codes for all parsers used in Step 1 are available on the primary focus is on carbon mappings for metabolic flux MetRxn website. analysis [34]. Also, many metabolic models retain the Step 2: Source data parsing KEGG identifiers of metabolites and reactions extracted The “raw data” from both databases and models was during their construction [35,36]. An important recent unified using standard SQL scripts on a MySQL server. development is the web resource Model SEED that can The description schema for metabolites includes source, generate draft genome-scale metabolic models drawing name, abbreviations used in the source, chemical for- from an internal database that integrates KEGG with 13 mula, and geometry. The schema for reactions accounts genome scale models (including six of the models in the for source, name, reaction string (reactants and pro- BiGG database) [37]. All of the reactions in Model ducts), organism designation, associated enzymes and SEED and BiGG are charge and elementally balanced. genes, EC number, compartment, reversibility/direction, In this paper, we describe the development and high- and pathway information. Once a source has been light applications of the web-based resource MetRxn imported into the MySQL server, a data source-specific that integrates, using internally consistent descriptions, dictionary is created to map metabolite abbreviations metabolite and reaction information from 8 databases onto names/synonyms and structures and metabolites to and 44 metabolic models. The MetRxn knowledgebase reactions. (as of October 2011) contains over 76,000 metabolites Step 3: Metabolite charge and structural analysis and 72,000 reactions (including unresolved entries) that We used Marvin (Chemaxon) to analyze all 218,122 raw are charge and elementally balanced. By conforming to metabolite entries containing structural information (out standardized metabolite and reaction descriptions, of a total of 322,936, including BRENDA entries). Incon- MetRxn enables users to efficiently perform queries and sistencies were found in 12,965 entries typically due to comparisons across models and/or databases. For exam- wrong atom connectivity, valence, bond length or stereo ple, common metabolites and/or reactions between chemical information, which were corrected using APIs models and databases can rapidly be generated along available in Marvin. A final corrected version of the with connected paths that link source to target metabo- metabolite geometries was calculated at a fixed pH of lites. MetRxn supports export of models in SBML for- 7.2 and converted into standard Isomeric SMILES format. New models are being added as they are published mat. The structure/formula used corresponded to the or made available to us. It is available as a web-based major microspecies found during the charge calculation, resource at http://metrxn.che.psu.edu. which effectively rounds the charge to an integer value in accordance with previous model construction conven- Construction and Content tions. This format includes both chiral and stereo infor- MetRxn construction mation, as it allows specification of molecular The construction of MetRxn largely followed the follow- configuration [38-40]. Metabolites were also annotated ing steps, as illustrated in Figure 2: 1) download of pri- with Canonical SMILES using the OpenBabel Interface mary sources of data from databases and models, 2) from Chemspider. The canonical representation encodes Kumaretal.BMCBioinformatics2012,13:6 Page4of13 http://www.biomedcentral.com/1471-2105/13/6 Metabolite & genome-scale databases reaction models information extraction Download / PubChem identify ChemSpider metabolite bond connectivity information Raw database Canonical / isomeric Metabolite metabolites SMILES at pH 7.2 identity analysis Full Partial No atomistic atomistic atomistic detail detail detail Elemental & charge reactions balancing Reaction identity All metabolites At least one analysis have full atomistic metabolite lacks detail full atomistic detail Disambiguation of A + B → C + D metabolites using ? structure & phonetic A + B → C + E comparisons MetRxn Figure 2 Flowchart outlining the construction of MetRxn. After download of primary sources of data from databases andmodels, we integratedmetaboliteandreactiondata,followedbycalculationandreconciliationofstructuralinformation.Byidentifyingoverlapsbetween metaboliteandreactioninformation,wegeneratedelementalandchargebalancingofreactions.TheprocedurefordevelopingMetRxnwas iterativewithsubsequentpassesmakinguseofpreviousassociationstoresolveremainingambiguities. only atom-atom connectivity while ignoring all confor- [41,42] to resolve the identity of 34,984 metabolites and mers for a metabolite. Using bond connectivity informa- 32,311 reactions. Another 6,100 metabolites and 11,401 tion from the primary sources and resources such as reactions involved, in various degrees, lack of full ato- PubChem and ChemSpider we used Canonical SMILES mistic detail in their description (e.g., use an R or × as Kumaretal.BMCBioinformatics2012,13:6 Page5of13 http://www.biomedcentral.com/1471-2105/13/6 side-chains, are generic compounds like “amino acid” or were compared against the ones without any associated “electron acceptor”). Over 25,000 duplicate metabolites structure. The algorithm suppresses keywords/tokens and 27,000 reaction entries were identified and consoli- depicting stereo information such as cis, trans, L-, D-, dated within the database. The metabolites and reac- alpha, beta, gamma, and numerical entries because they tions present in the resolved repository were further change the phonetic signature of the synonym under classified with respect to the completeness of atomistic investigation. In addition, the algorithm ignores non- detail in their description. chemistry related words (e.g., use, for, experiment) that Step 4: Metabolite synonyms and initial reaction are found in some metabolite names. Certain tokens reconciliation such as “-ic acid” and “-ate” are treated as equivalent. Raw metabolite entries were assigned to Isomeric PubChem and Chemspider sources were accessed SMILES representations whenever possible. If insuffi- through the GUI so that the curator gets as much infor- cient structural information was available for a down- mation as possible to identify the data correctly. Pho- loaded raw metabolite then it was assigned temporarily netic matches provided clues for resolving over 159 with the Canonical SMILES and revisited during the metabolites. The iterative application of string and pho- reaction reconciliation. Canonical SMILES retain atom netic comparison algorithms resolved as many as 8,879 connectivity but not stereo-specificity and are used as metabolites after 18 rounds of reconciliation. the basic metabolite topology descriptors as many meta- Upon completion of this workflow, all genome-scale bolic models lack stereo-specificity information. After models are reformatted into a computations-ready form generating the initial metabolite associations, we identi- and Flux Balance Analysis [43] is performed on both the fied reaction overlaps using the reaction synonyms and source model and the standardized model in MetRxn to reaction strings along with the metabolite SMILES ascertain the ability of the model to produce biomass representations. Directionality and cofactor usage were before and after standardization. We performed the cal- temporarily ignored. During this step, reactions were culations using GAMS version 12.6. MetRxn is accessi- flagged as single-compartment or two-compartment (i.e., ble through a web interface that indirectly generates transport reactions). MetRxn internally retains the origi- MySQL queries. In order to facilitate analysis and use of nal compartment designations, but currently only dis- the data, a number of tools are provided as part of plays these simplified compartment designations. In MetRxn. analogy to metabolites, reactions were grouped into families that shared participants but in the source data Data export and display sets occurred in different compartments or differed only MetRxn supports a number of export capabilities. In in protonation. general, any list that is displayed contains live links to Step 5: Reaction charge and elemental balancing the metabolite or reaction entities. These lists can con- Once metabolites were assigned correct elemental com- sist of an entire model, data from a comparison, or position and protonation states, reactions were charge query results. All items can be exported to SMBL for- and elementally balanced. To this end, for charge balan- mat. In addition, the public MySQL database will be cing we relied on a linear programming representation made available upon request. Because of licensing lim- that minimizes the difference in the sum of the charge itations, the BRENDA database cannot be exported and of the reactants and the sum of the charge on the pro- is not part of the public MySQL database. However, we ducts. The complete formulation is provided in the doc- plan to provide Java source code that allows for the umentation at MetRxn. integration of a local copy of the public MySQL data- Step 6: Iterative reaction reconciliation base with the BRENDA database (provided upon Reactions with one (or more) unresolved reactants and/ request). or products were string compared against the entire resolved collection of reactions. This step was succes- Source comparisons and visualization sively executed as newly resolved metabolites and reac- In addition to listing the content (number of metabo- tions could enable the resolution of previously lites, reactions, etc.) of the selected data source(s), unresolved ones. After the first pass 164 metabolites MetRxn contains tools for comparing two or more mod- were resolved, while subsequent passes (up to 18 for els and visualizing the results. These associations can be some models) helped resolved a total of 8,720 entries. for metabolites or reactions. During these comparisons Reactions with significant (but not complete) overlap- compartment information and reversibility are sup- ping sets of reactants/products are additionally sent to pressed. Comparison tables are generated by comparing the curator GUI including phonetic information. Briefly, the associations between the selected data source(s) the phonetic tokens of synonyms with known structures using the canonical structures. Kumaretal.BMCBioinformatics2012,13:6 Page6of13 http://www.biomedcentral.com/1471-2105/13/6 MetRxn Scope distribution of metabolite resolution across models and An initial repository of reaction (i.e., 154,399) and meta- databases in MetRxn. In general, metabolites without bolite (i.e., 322,936) entries were downloaded from 8 fully-specified structures tend to participate in a rela- databases and 44 genome-scale metabolic models. We tively small number of reactions. compiled a non-redundant list of 42,540 metabolites The workflow followed in the creation of the MetRxn and 35,474 reactions (after consolidating duplicate knowledgebase identified a number of inconsistencies. entries) containing full atomistic and bond connectivity For instance, the same metabolite name may map to detail. Another 6,100 metabolites and 11,401 reactions molecules with different numbers of repeat units (e.g., have partial atomistic detail typically containing generic lecithin) or completely different structures (e.g., AMP side-chains (R) and/or an unspecified number of poly- could refer to either adenosine monosphate or ampicil- mer repeat units. Finally, 5,436 metabolites in metabolic lin). Notably, even for the most well-curated metabolic models and 8,000 metabolites in databases are retained model, E. coli iAF1260 [32], we found minor errors or with no atomistic detail. In some cases lack of atomistic omissions (a total of 17) arising from inconsistencies or detail reflects complete lack of identity specificity (e.g., incompleteness of representation in the data culled from electron donor) whereas in other cases even though the other sources. For example, the metabolite abbreviation chemical species is fully defined, atomistic level descrip- arbtn-fe3 was mistakenly associated with the KEGG ID tion is not warranted (e.g., gene product of dsbC protein and structure of aerobactin instead of ferric-aerobactin. disulfide isomerase II (reduced)). Figure 3 shows the The number of inconsistencies is dramatically increased (cid:61)(cid:72)(cid:68)(cid:3)(cid:80)(cid:68)(cid:92)(cid:86)(cid:3)(cid:76)(cid:53)(cid:54)(cid:20)(cid:24)(cid:25)(cid:22) (cid:36)(cid:85)(cid:68)(cid:69)(cid:76)(cid:71)(cid:82)(cid:83)(cid:86)(cid:76)(cid:86)(cid:3)(cid:87)(cid:75)(cid:68)(cid:79)(cid:76)(cid:68)(cid:81)(cid:68)(cid:3)(cid:76)(cid:53)(cid:54)(cid:20)(cid:24)(cid:28)(cid:26) (cid:51)(cid:86)(cid:72)(cid:88)(cid:71)(cid:82)(cid:80)(cid:82)(cid:81)(cid:68)(cid:86)(cid:3)(cid:68)(cid:72)(cid:85)(cid:88)(cid:74)(cid:76)(cid:81)(cid:82)(cid:86)(cid:68) (cid:51)(cid:86)(cid:72)(cid:88)(cid:71)(cid:82)(cid:80)(cid:82)(cid:81)(cid:68)(cid:86)(cid:3)(cid:83)(cid:88)(cid:87)(cid:76)(cid:71)(cid:68)(cid:3)(cid:46)(cid:55)(cid:21)(cid:23)(cid:23)(cid:19)(cid:3)(cid:76)(cid:45)(cid:49)(cid:26)(cid:23)(cid:25) (cid:37)(cid:68)(cid:70)(cid:76)(cid:79)(cid:79)(cid:88)(cid:86)(cid:3)(cid:86)(cid:88)(cid:69)(cid:87)(cid:76)(cid:79)(cid:76)(cid:86)(cid:3)(cid:76)(cid:37)(cid:86)(cid:88)(cid:20)(cid:20)(cid:19)(cid:22)(cid:3) (cid:47)(cid:72)(cid:76)(cid:86)(cid:75)(cid:80)(cid:68)(cid:81)(cid:76)(cid:68)(cid:3)(cid:80)(cid:68)(cid:77)(cid:82)(cid:85) (cid:36)(cid:86)(cid:83)(cid:72)(cid:85)(cid:74)(cid:76)(cid:79)(cid:79)(cid:88)(cid:86)(cid:3)(cid:82)(cid:85)(cid:92)(cid:93)(cid:68)(cid:72) (cid:40)(cid:86)(cid:70)(cid:75)(cid:72)(cid:85)(cid:76)(cid:70)(cid:75)(cid:76)(cid:68)(cid:3)(cid:70)(cid:82)(cid:79)(cid:76)(cid:3)(cid:76)(cid:36)(cid:41)(cid:20)(cid:21)(cid:25)(cid:19) (cid:54)(cid:68)(cid:79)(cid:80)(cid:82)(cid:81)(cid:72)(cid:79)(cid:79)(cid:68)(cid:3)(cid:87)(cid:92)(cid:83)(cid:75)(cid:76)(cid:80)(cid:88)(cid:85)(cid:76)(cid:88)(cid:80) (cid:53)(cid:75)(cid:76)(cid:93)(cid:82)(cid:69)(cid:76)(cid:88)(cid:80)(cid:3)(cid:72)(cid:87)(cid:79)(cid:76) (cid:48)(cid:92)(cid:70)(cid:82)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:76)(cid:88)(cid:80)(cid:3)(cid:87)(cid:88)(cid:69)(cid:72)(cid:85)(cid:70)(cid:88)(cid:79)(cid:82)(cid:86)(cid:76)(cid:86)(cid:3)(cid:43)(cid:22)(cid:26)(cid:53)(cid:89) (cid:36)(cid:86)(cid:83)(cid:72)(cid:85)(cid:74)(cid:76)(cid:79)(cid:79)(cid:88)(cid:86)(cid:3)(cid:81)(cid:76)(cid:71)(cid:88)(cid:79)(cid:68)(cid:81)(cid:86) (cid:54)(cid:68)(cid:70)(cid:70)(cid:75)(cid:68)(cid:85)(cid:82)(cid:80)(cid:92)(cid:70)(cid:72)(cid:86)(cid:3)(cid:70)(cid:72)(cid:85)(cid:72)(cid:89)(cid:76)(cid:86)(cid:76)(cid:68)(cid:72)(cid:3)(cid:76)(cid:41)(cid:41)(cid:26)(cid:19)(cid:27) (cid:42)(cid:72)(cid:82)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:3)(cid:80)(cid:72)(cid:87)(cid:68)(cid:79)(cid:79)(cid:76)(cid:85)(cid:72)(cid:71)(cid:88)(cid:70)(cid:72)(cid:81)(cid:86) full (cid:36)(cid:70)(cid:76)(cid:81)(cid:72)(cid:87)(cid:82)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:3)(cid:69)(cid:68)(cid:92)(cid:79)(cid:92)(cid:76) partial (cid:72) (cid:54)(cid:75)(cid:72)(cid:90)(cid:68)(cid:81)(cid:72)(cid:79)(cid:79)(cid:68)(cid:3)(cid:82)(cid:81)(cid:72)(cid:76)(cid:71)(cid:72)(cid:81)(cid:86)(cid:76)(cid:86)(cid:3)(cid:76)(cid:54)(cid:50)(cid:26)(cid:27)(cid:22)(cid:3) none (cid:70) (cid:85) (cid:54)(cid:68)(cid:70)(cid:70)(cid:75)(cid:68)(cid:85)(cid:82)(cid:80)(cid:92)(cid:70)(cid:72)(cid:86)(cid:3)(cid:70)(cid:72)(cid:85)(cid:72)(cid:89)(cid:76)(cid:86)(cid:76)(cid:68)(cid:72)(cid:3)(cid:76)(cid:49)(cid:39)(cid:26)(cid:24)(cid:19) (cid:88) (cid:82) (cid:54)(cid:87)(cid:68)(cid:83)(cid:75)(cid:92)(cid:79)(cid:82)(cid:70)(cid:82)(cid:70)(cid:70)(cid:88)(cid:86)(cid:3)(cid:68)(cid:88)(cid:85)(cid:72)(cid:88)(cid:86) (cid:86) (cid:51)(cid:82)(cid:85)(cid:83)(cid:75)(cid:92)(cid:85)(cid:82)(cid:80)(cid:82)(cid:81)(cid:68)(cid:86)(cid:3)(cid:74)(cid:76)(cid:81)(cid:74)(cid:76)(cid:89)(cid:68)(cid:79)(cid:76)(cid:86) BRENDA (cid:43)(cid:68)(cid:79)(cid:82)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:76)(cid:88)(cid:80)(cid:3)(cid:86)(cid:68)(cid:79)(cid:76)(cid:81)(cid:68)(cid:85)(cid:88)(cid:80) BKM-react (cid:48)(cid:72)(cid:87)(cid:75)(cid:68)(cid:81)(cid:82)(cid:86)(cid:68)(cid:85)(cid:70)(cid:76)(cid:81)(cid:68)(cid:3)(cid:68)(cid:70)(cid:72)(cid:87)(cid:76)(cid:89)(cid:82)(cid:85)(cid:68)(cid:81)(cid:86) (cid:48)(cid:72)(cid:87)(cid:75)(cid:68)(cid:81)(cid:82)(cid:86)(cid:68)(cid:85)(cid:70)(cid:76)(cid:81)(cid:68)(cid:3)(cid:69)(cid:68)(cid:85)(cid:78)(cid:72)(cid:85)(cid:76) ChEBI (cid:47)(cid:68)(cid:70)(cid:87)(cid:82)(cid:69)(cid:68)(cid:70)(cid:76)(cid:79)(cid:79)(cid:88)(cid:86)(cid:3)(cid:83)(cid:79)(cid:68)(cid:81)(cid:87)(cid:68)(cid:85)(cid:88)(cid:80)(cid:3)(cid:58)(cid:38)(cid:41)(cid:54)(cid:20) (cid:48)(cid:92)(cid:70)(cid:82)(cid:83)(cid:79)(cid:68)(cid:86)(cid:80)(cid:68)(cid:3)(cid:74)(cid:72)(cid:81)(cid:76)(cid:87)(cid:68)(cid:79)(cid:76)(cid:88)(cid:80) KEGG (cid:54)(cid:68)(cid:70)(cid:70)(cid:75)(cid:68)(cid:85)(cid:82)(cid:80)(cid:92)(cid:70)(cid:72)(cid:86)(cid:3)(cid:70)(cid:72)(cid:85)(cid:72)(cid:89)(cid:76)(cid:86)(cid:76)(cid:68)(cid:72)(cid:3)(cid:76)(cid:47)(cid:47)(cid:25)(cid:26)(cid:21) MetaCyc (cid:42)(cid:72)(cid:82)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:3)(cid:86)(cid:88)(cid:79)(cid:73)(cid:88)(cid:85)(cid:85)(cid:72)(cid:71)(cid:88)(cid:70)(cid:72)(cid:81)(cid:86) (cid:38)(cid:79)(cid:82)(cid:86)(cid:87)(cid:85)(cid:76)(cid:71)(cid:76)(cid:88)(cid:80)(cid:3)(cid:87)(cid:75)(cid:72)(cid:85)(cid:80)(cid:82)(cid:70)(cid:72)(cid:79)(cid:79)(cid:88)(cid:80) HMDB (cid:48)(cid:68)(cid:81)(cid:81)(cid:75)(cid:72)(cid:76)(cid:80)(cid:76)(cid:68)(cid:3)(cid:86)(cid:88)(cid:70)(cid:70)(cid:76)(cid:81)(cid:76)(cid:70)(cid:76)(cid:83)(cid:85)(cid:82)(cid:71)(cid:88)(cid:70)(cid:72)(cid:81)(cid:86) (cid:54)(cid:87)(cid:85)(cid:72)(cid:83)(cid:87)(cid:82)(cid:80)(cid:92)(cid:70)(cid:72)(cid:86)(cid:3)(cid:70)(cid:82)(cid:72)(cid:79)(cid:76)(cid:70)(cid:82)(cid:79)(cid:82)(cid:85) BiGGDB (cid:38)(cid:79)(cid:82)(cid:86)(cid:87)(cid:85)(cid:76)(cid:71)(cid:76)(cid:88)(cid:80)(cid:3)(cid:68)(cid:70)(cid:72)(cid:87)(cid:82)(cid:69)(cid:88)(cid:87)(cid:92)(cid:79)(cid:76)(cid:70)(cid:88)(cid:80) ChemSpider (cid:49)(cid:72)(cid:76)(cid:86)(cid:86)(cid:72)(cid:85)(cid:76)(cid:68)(cid:3)(cid:80)(cid:72)(cid:81)(cid:76)(cid:81)(cid:74)(cid:76)(cid:87)(cid:76)(cid:71)(cid:76)(cid:86) (cid:38)(cid:82)(cid:85)(cid:92)(cid:81)(cid:72)(cid:69)(cid:68)(cid:70)(cid:87)(cid:72)(cid:85)(cid:76)(cid:88)(cid:80)(cid:3)(cid:74)(cid:79)(cid:88)(cid:87)(cid:68)(cid:80)(cid:76)(cid:70)(cid:88)(cid:80) 0 10000 20000 30000 40000 (cid:48)(cid:88)(cid:86)(cid:3)(cid:80)(cid:88)(cid:86)(cid:70)(cid:88)(cid:79)(cid:88)(cid:86)(cid:3)(cid:70)(cid:68)(cid:85)(cid:71)(cid:76)(cid:82)(cid:80)(cid:92)(cid:82)(cid:70)(cid:92)(cid:87)(cid:72) (cid:54)(cid:68)(cid:70)(cid:70)(cid:75)(cid:68)(cid:85)(cid:82)(cid:80)(cid:92)(cid:70)(cid:72)(cid:86)(cid:3)(cid:70)(cid:72)(cid:85)(cid:72)(cid:89)(cid:76)(cid:86)(cid:76)(cid:68)(cid:72)(cid:3)(cid:76)(cid:48)(cid:43)(cid:27)(cid:19)(cid:24) Figure3Variouslevelsofstructuralinformationwasavailableformodels(main)anddatabases(inset).Foreverymodel,themajorityof metaboliteshadfullatomisticdetail(blue).Thesmallernumberofmetaboliteswithpartialatomisticdetail(orange)suchasgeneticsidechains, orwithnoatomisticdetail(green)suchasgeneproducts,participatedinfewreactions. Kumaretal.BMCBioinformatics2012,13:6 Page7of13 http://www.biomedcentral.com/1471-2105/13/6 for less-curated metabolic models. We used a variety of proton had to be added to the reactants side in the reac- procedures to disambiguate the identity of metabolites tion (R, R)-Butanediol-dehydrogenase in which butane- lacking structural information ranging from reaction diol reacts with NAD to form acetoin. In addition, the matching to phonetic searches. For example, in the Cor- stoichiometric coefficient of water in GTP cyclohydro- ynebacterium glutamicum model [44], 7,8-aminopelargo- lase I was erroneously set at -2 which resulted in an nic acid (DAPA) has no associated structural imbalance in oxygen atoms. The re-balancing analysis information. Reaction matching found the same reaction changed the coefficient to -1 (as listed in BRENDA) and in the E. coli iAF1260 model. added a proton to the list of reactants (absent from BRENDA) in order to also balance charges. C.glutamicumDAPA+ ATP+ CO2⇔ DTBIOTIN+ ADP+ PI We performed flux balance analysis (FBA) on both the published and MetRxn-based rebalanced version of the Acinetobacter baylyi model using the uptake constraints iAF1260[c]: atp + co2 + dann → adp + dtbt + (3) h + pi listed in [45] to assess the effect of re-balancing reaction which implies that 7,8-aminopelargonic acid (DAPA) entries on FBA results. We found that the maximum is identical to 7,8-Diaminononanoate (dann). Examina- biomass using the glucose/ammonia uptake environ- tion of pelargonic acid and nonanoate reveals that they ment decreased by 9% primarily due to the increased were indeed known synonyms. In many cases, we were energetic costs associated with maintaining the proton also able to assign stereo-specific information to meta- gradient. This result demonstrates the significant effect bolite entries in models (e.g., stipulate the L-lysine iso- that lack of reaction balancing may cause in FBA calcu- mer for lysine). We made use of an iterative approach lations. Overall, we found that nearly two-thirds of the that allowed us to map structures from models with models had at least one unbalanced reaction, with over explicit links to structures (e.g. to KEGG or CAS num- 2,400 entities across all models that were either charge bers) to models that only provided metabolite names. or elementally imbalanced. Frequently, the same reac- Furthermore, by using a phonetic algorithm that uses tion was imbalanced in multiple models (each occur- tokens for equivalent strings in metabolite names (e.g., rence was counted separately). ‘-ic acid’ and ‘-ate’ are equivalent) we were able to resolve more than an additional 159 metabolites. For 2. Contrasting existing metabolic models example, phonetic searches flagged cis-4-coumarate and At the onset of creating MetRxn, we conducted a brief COUMARATE in the Acinetobacter baylyi model [45] preliminary study to quantify the extent/severity of nam- as potentially identical compounds. Additional checks ing inconsistencies by contrasting the reaction informa- revealed that indeed both metabolites should map to the tion contained in an initial collection of 34 of the most same structure. A more complex matching example popular genome-scale models spanning 21 bacterial, 10 involved 1-(5’-Phosphoribosyl)-4-(N-succinocarboxa- eukaryotic and three archaeal organisms. Across all mide)-5-aminoimidazole from the Bacillus subtilis branches of life, most metabolic processes are largely model [46] and 1-(5’-Phosphoribosyl)-5-amino-4-(N-suc- conserved (e.g., glycolysis, pentose phosphate pathway, cinocarboxamide)-imidazole from the Aspergillus nidu- amino acid biosynthesis, etc.) therefore we expected to lans model [47]. We note that the phonetic algorithm uncover a large core of common reactions shared by all only makes suggestions and orders the possible matches models. Surprisingly, we found that only three reactions for the curator. Next, we detail three examples that pro- (i.e., phosphoglycerate mutase, phosphoglycerate kinase, vide an insight into the type of tasks that MetRxn can and CO transport) were directly recognized as common 2 facilitate. across those 34 models using a simple string match comparison. Even when examining models for only a Utility and Discussion few bacterial organisms (Bacillus subtilis, Escherichia 1. Charge and elementally balanced metabolic models coli, Mycobacterium tuberculosis, Mycoplasma genita- The standardized description of metabolites and lium, andSalmonella Typhimurium) simple text balanced reactions afforded by MetRxn enables the searches recognized only 40 common reactions (out of a expedient repair of existing models for metabolite nam- possible 262, which is the size of the M. genitalium ing inconsistencies and reaction balancing errors. Here model). The reason for this glaring inconsistency is that we highlight one such metabolic model repair for Acine- differing metabolite naming conventions, compartment tobacter baylyi iAbaylyiv4 [45]. We identified that 189 designations, stoichiometric ratios, reversibility, and out of 880 reactions are not elementally or charge water/proton balancing issues prevents the automated balanced. Most of the reactions with charge balance recognition of genuinely shared reactions across models. errors involved a missed proton in reactions involving Using the glucose-6-phosphate dehydrogenase reaction cofactor pairs such as NAD/NADH. For example, a as a representative example, Table 1 reveals some of the Kumaretal.BMCBioinformatics2012,13:6 Page8of13 http://www.biomedcentral.com/1471-2105/13/6 Table 1Representation ofglucose-6-phosphate dehydrogenase in selected metabolic models Reaction Model/Citation [c]:g6p+nadp<==>6pgl+h+nadph Escherichiacoli[32];Lactobacillusplantarum[48];Pseudomonasaeruginosa[49];Staphylococcusaureus [50] [c]:g6p+nadp–>6pgl+h+nadph Bacillussubtillis[51];Mycobacteriumtuberculosis[13];Pseudomonasputida[52];Rhizobiumetli[53]; Saccharomycescerevisiae[54] [c]g6p+nadp<==>6pgl+h+nadph Saccharomycescerevisiae[55];Escherichiacoli[56] [c]:f420-2+g6p–>6pgl+f420-2h2 Methanosarcinabarkeri[57] G6P+NADP<->D6PGL+NADPH Escherichiacoli[58];Musmusculus[59];Saccharomycescerevisiae[60,61] G6P+NADP->D6PGL+NADPH Aspergillusnidulans[47];Mannheimiasucciniciproducens[62];Streptomycescoelicolor[63] G6P+NAD->D6PGL+NADH Helicobacterpylori[64] C01172+C00006=C01236+C00005+ GSMmouse[35] C00080 C00092+C00006<=>C01236+C00005 Halobacteriumsalinarum[36] +C00080 reasons for failing to automatically recognize common particular, in the C. thermocellum model [66] charged/ reactions across selected models [13,32,35,36,47-64]. As uncharged tRNA metabolites are explicitly tracked many as nine different representations of the same reac- whereas they are not included in the C. acetobutylicum tion exist due to incomplete elemental and charge bal- model [65]. Surprisingly, both clostridia models are ancing, alternate cofactor usage among different more similar, at the metabolite level, to the Bacillus sub- organisms, and lack of universal metabolite naming con- tilis iBsu1103 model [46] rather than to each other (see ventions. We have found that this level of discord Figure 4). Charged/uncharged tRNA metabolites between models is representative for most metabolic account for most of the increased overlap between C. reactions. This lack of consistency renders direct path- thermocellum and B. subtilis. Most of the reaction over- way comparisons across models meaningless and the laps are in the amino acids biosynthesis pathways, car- aggregation of reaction information from multiple mod- bohydrate metabolism, and nucleoside metabolism. It is els precarious. This deficiency motivated the develop- important to note that 48 reactions in C. acetobutyli- ment of MetRxn. Given standardization in metabolite cum, 67 reactions in C. thermocellum, and 120 reactions naming and elementally/charge balanced reaction entries in B. subtilis lack full atomistic information (see Figure MetRxn allows for the identification of shared reactions 3) and thus were excluded from any comparisons. It is as well as differences between any two metabolic models possible that additional shared reactions between the (assuming that all the metabolites in the compared reac- two models can be deduced by further examining com- tion entries have full atomistic information). When mak- parisons between not fully structurally specified metabo- ing the comparison of those same metabolic models, lite entries. The string/phonetic comparison algorithms MetRxn found an additional 15 reactions in common described under Step 6 along with assisted curation (for a total of 55 – a 38% increase) and that 142 reac- could be adapted for this task. tions are shared by B. subtilis, E. coli and Salmonella Typhimurium. 3. Using MetRxn to Bio-Prospect for Novel Production The Web interface of MetRxn allows for any number Routes of models to be simultaneously compared. As a demon- A “Grand Challenge” in biotechnological production is stration of this capability we selected to contrast the the identification of novel production routes that allow metabolic content of two clostridia models: Clostridium for the conversion of inexpensive resources (e.g., various acetobutylicum [65] and Clostridium thermocellum [66]. sugars) into useful products (e.g., succinate, artemisinin) Figure 4 shows the results in the form of a Venn dia- and bio-fuels (e.g., ethanol, butanol, biodiesel etc.). gram. Some of the differences between the clostridia Selected production routes must exhibit high yields, species are not surprising arising due to their differing avoid thermodynamic barriers, bypass toxic intermedi- lifestyles (C. acetobutylicum contains solventogenesis ates and circumvent existing intellectual property pathways and a CoB12 pathway, whereas C. thermocel- restrictions. Historically, the incorporation of heterolo- lum contains cellulosome reactions). However, we found gous pathways relied largely on human intuition and lit- many differences that appear to reflect different conven- erature review followed by experimentation [67,68]. tions adopted when the two models were generated Currently, rapidly expanding compilations of biotrans- rather than genuine differences in metabolism. In formations such as KEGG [69] and BRENDA [70] are Kumaretal.BMCBioinformatics2012,13:6 Page9of13 http://www.biomedcentral.com/1471-2105/13/6 metabolites reactions C. acetobutylicum C. thermocellum C. acetobutylicum C. thermocellum 29 58 79 57 173 224 266 79 61 90 37 66 642 1181 B. subtilis B. subtilis Figure4ComparisonofmetaboliteandreactionoverlapsforC.acetobutylicumandC.thermocellum,andB.subtilis.Althoughthetwo Clostridiumorganismsaresamegenus,themodelsofthesetwospecieshadsignificantnumbersofuniquemetabolites(left)andreactions (right),andcomparisonsrevealedthattherewasmoresimilarityinmetaboliteusagewithamodelofB.subtilisthanwitheachother.Inpart, theseoverlapsweredrivenbytheexplicitaccountingforchargedtRNAspeciesinC.thermocellumandB.subtilismodels,whichwasalso reflectedinthereactionoverlapsthroughreactionsinvolvingthesemetabolites. increasingly being prospected using search algorithms to PathComp [76], Pathway Tools [77,78], MetaRoute [79], identify biosynthetic routes to important product mole- PathFinder [80] and UM-BBD Pathway Prediction Sys- cules. Several optimization and graph-based methods tem [81] have been used to search databases for biocon- have been employed to computationally assemble novel version routes. biochemical routes from these sources. OptStrain [71] We recently published [82] a graph-based algorithm used a mixed-integer linear optimization representation that used reaction information from BRENDA and to identify the minimal number of reactions to be added KEGG to exhaustively identify all connected paths from (i.e. knock-ins) into a genome-scale metabolic model to a source to a target metabolite using a customized min- enable the production of the new molecule. However path algorithm [83]. We first demonstrated the min- the combinatorial nature of the problem poses a signifi- path procedure by identifying all synthesis routes for 1- cant challenge to the OptStrain methodology as the butanol from pyruvate using a database of 9,921 reac- number of reaction database entries increase from a few tions and 17,013 metabolites manually extracted from to tens of thousands. At the expense of not enforcing both BRENDA and KEGG. Here, we re-visited the same stoichiometric balances, graph-based algorithms have task using the full list of reactions and metabolites pre- inherently better-scaling properties for exhaustively sent in MetRxn to assess the discovery potential of identifying all min-path reaction entries that link a using MetRxn. Figure 5 illustrates all identified pathways source with a target metabolite. Hatzimanikatis et. al. from pyruvate to 1-butanol before MetRxn (29, shown [72] introduced a graph-based heuristic approach in blue) and the ones discovered after using MetRxn (BNICE) to identify all possible biosynthetic routes from (112, shown in green). As many as 83 new avenues for a given substrate to a target chemical by hypothesized 1-butanol production were revealed as a consequence of enzymatic reaction rules. In addition, the BNICE frame- using the expanded and standardized MetRxn resource. work was used to identify novel metabolic pathways for In addition, the search algorithm recovered known the synthesis of 3-hydroxypropionate in E. coli [73]. [84-88] synthesis routes using E. coli for the production Based on a similar approach, a new scoring algorithm of 1-butanol (shown in orange). The first pathway [74] was introduced to evaluate and compare novel involves the fermentative transformation of pyruvate pathways generated using enzyme-reaction rules. In and acetyl-CoA to 1-butanol using enzymes from C. addition, several techniques such as PathMiner [75], acetobutylicum [89]. The second pathway uses ketoacid Kumaretal.BMCBioinformatics2012,13:6 Page10of13 http://www.biomedcentral.com/1471-2105/13/6 propylamine butyl phosphate 2-oxo-methylthio oxoglutarate butanoate 3-methylthio acetate propianaldehyde acetoin L-malate acetoacetyl CoA acetaldehyde suc coA butanoyl-CoA acetyl-CoA 1-butanal 1-butanol pyruvate propanoyl-CoA butanoate crotonoyl 3-oxopropanoate CoA acetyl phosphate isobutyryl CoA 2-ketovaline hydroxy acetate butyryl CoA acetyl-CoA 3-acetoacyl-CoA 2-hydroxyendoic acid oxaloacetate citramalate b-glucose mannitol acetoacyl-CoA 2-ketoglutarate glyoxylate 4-hydroxyphenylpyruvate 2-oxoacid 2-oxoglutamate succinic semialdehyde L-lysine pyrroline-5-carboxylate indolyl pyruvate 3-mercaptopyruvate 4-methyl-2-oxopentanoate 2-oxoadipate Figure5Pathwaysfrompyruvateto1-butanol.UsingtheMetRxnknowledgebase,weidentifiedalargenumberofnewpathways(green)as wellaspreviouslyestablishedones(orange)andthoseidentifiedfoundinapreviousstudy[82](blue). precursors [84]. This example demonstrates how the Conclusions biotransformations stored in MetRxn can be used to tra- MetRxn enables the standardization, correction and uti- verse a multitude of production routes for targeted lization of rapidly growing metabolic information for bioproducts. over 76,000 metabolites participating in 72,000 reactions

Description:

mary sources of data from databases and models, 2) integration of metabolite and reaction data, 3) calculation from the primary sources and resources such as

MetRxn: a knowledgebase of metabolites and reactions spanning PDF

13 Pages·2012·1.42 MB·English

by Akhil Kumar

Checking for file health...

Save to my drive

Quick download

Download

Download MetRxn: a knowledgebase of metabolites and reactions spanning PDF Free - Full Version

by Akhil Kumar| 2012| 13 pages| 1.42| English

Download MetRxn: a knowledgebase of metabolites and reactions spanning by Akhil Kumar in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About MetRxn: a knowledgebase of metabolites and reactions spanning

mary sources of data from databases and models, 2) integration of metabolite and reaction data, 3) calculation from the primary sources and resources such as

Detailed Information

Author:	Akhil Kumar
Publication Year:	2012
Pages:	13
Language:	English
File Size:	1.42
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free MetRxn: a knowledgebase of metabolites and reactions spanning Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download MetRxn: a knowledgebase of metabolites and reactions spanning PDF?

Yes, on https://PDFdrive.to you can download MetRxn: a knowledgebase of metabolites and reactions spanning by Akhil Kumar completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read MetRxn: a knowledgebase of metabolites and reactions spanning on my mobile device?

After downloading MetRxn: a knowledgebase of metabolites and reactions spanning PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of MetRxn: a knowledgebase of metabolites and reactions spanning?

Yes, this is the complete PDF version of MetRxn: a knowledgebase of metabolites and reactions spanning by Akhil Kumar. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download MetRxn: a knowledgebase of metabolites and reactions spanning PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.