Table Of ContentWorkshoptrack-ICLR2017
TOWARDS “ALPHACHEM”: CHEMICAL SYNTHESIS
PLANNING WITH TREE SEARCH AND DEEP NEURAL
NETWORK POLICIES
MarwinSegler♣∇ MikePreuß♦ MarkP.Waller♠~
♣InstituteofOrganicChemistry ♠DepartmentofPhysics
∇CenterforMultiscaleTheoryandComputation ~InternationalCenterforQuantum
♦DepartmentofInformationSystems andMolecularStructures
7 Westfa¨lischeWilhelms-Universita¨tMu¨nster ShanghaiUniversity
1 {marwin.segler,mike.preuss}@wwu.de [email protected]
0
2
n
a ABSTRACT
J
1 Retrosynthesisisatechniquetoplanthechemicalsynthesisoforganicmolecules,
3 forexampledrugs,agro-andfinechemicals.Inretrosynthesis,asearchtreeisbuilt
by analysing molecules recursively and dissecting them into simpler molecular
] building blocks until one obtains a set of known building blocks. The search
I
A spaceisintractablylarge,anditisdifficulttodeterminethevalueofretrosynthetic
positions.Here,weproposetomodelretrosynthesisasaMarkovDecisionProcess.
.
s Incombinationwith a Deep NeuralNetworkpolicylearnedfromessentially the
c
[ complete published knowledge of chemistry, Monte Carlo Tree Search (MCTS)
can be used to evaluate positions. In exploratory studies, we demonstrate that
1 MCTS with neuralnetworkpoliciesoutperformsthe traditionallyused best-first
v
searchwithhand-codedheuristics.
0
2
0
1 INTRODUCTION
0
0
. From medicines to materials, organic molecules are ubiquitous. They are indispensable for hu-
2
manwell-being. Tomakethem,chemistsusuallyplantheirsyntheseswitha“working-backwards”
0
problem-solvingtechniquecalledretrosynthesis[Corey&Cheng(1989)]. Inretrosynthesis,thetar-
7
getmoleculeisrecursivelyanalysedanddissected,untilasetofknownbuildingblocksisobtained.
1
: Then,thechemistcangotothelabandexecutethereactionstepsstartingwiththebuildingsblocks.
v Itisaformidableintellectualchallengethathasstronganalogiestopuzzlegames: Thereisalarge
i
X numberofrules,whicharesubsequentlyappliedtothegamestatetogiverisetoalarge,intractable
search tree. While this tree is notas large as in Go, mainly due to the shallower depth of usually
r
a ≈10–20, retrosynthesis has a significant branching factor (b ≈ 200). An additionalproblem that
arisesinretrosynthesis,whichisnotverysevereinGoorChess, istodeterminetheapplicableac-
tions. Moleculesaremodelledasgraphs,andthe(re)actionsareproductionrulesongraphswhich
involvesolvingtheNP-completesubgraphisomorphismproblem.Thus,inacompleteenumeration,
forN rules(fora non-toysystem N ≈ 104−105), N ×bd NP-completestepswouldhaveto be
performedonlytodeterminethelegalactions!
Classically, automated retrosynthesis (computer-aidedsynthesis design) has been performed with
verylargeexpertsystemsbasedontensofthousandsofmanuallyencodedreactionproductionrules
andheuristics.Similartootherdomains,despiteof40yearsofactiveresearchtheseexpertsystems
havenotdeliveredconvincingresultsyet[Todd(2005);Szymkuc´etal.(2016)]. However,retrosyn-
thesis is in many aspects more similar to Go than to chess. No good heuristics to estimate state
valueshavebeenfoundyet,whichprohibitstruncatedsearch. The“games”thushavetobeplayed
out. Additionally,chemistsrelyontheirintuition,whichtheymasterduringlongyearsofworkand
study,toprioritisewhichrulestoapplywhenretroanalysingmolecules. Analogoustomastermove
predictioninGo(Maddisonetal.(2014)),weshowedrecentlythat,insteadofhand-coding,neural
networks can learn the “master chemist moves”. We extracted production rules from millions of
reactions (essentially the complete published knowledge of chemistry [Reaxys (2017)]) automati-
cally, and then trained NeuralNetworksto predictthe besttransformation, or “the best move”, to
1
Workshoptrack-ICLR2017
a) Retrosynthesis: Analyse the target in reverse (chemical representation) d) Search Tree Representation
CO H2 CO H2CO Ac2O
S7 S4
O
OH Me OH O S3 S1 = {1}
1Ibuprofen 2 3 4 S6 S2 = {2,CO}
S3 = {3,CO,H2}
b) Applied transformations (productions) applied in each analysis step S2 S5 S4 = {4,CO,H2,Ac2O}
O S8
OH O
OH Ac2O S1 Root (Ibuprofen)
e) Neural Networks determine the Actions
c) Synthesis: Go to the lab and actually conduct the reactions, following plan a)
O CO
Ac2O H2 CO O 1 OH Me2 OH
O OH apply
OH rules to 1
4 3 2 1Ibuprofen x p(rn|x)
MD(EeoCsleFccrPiup4lta)orr DeepC lNasesuirfiacla Ntioentwork mruolesst probable
Figure 1: a) The synthesis of the target (ibuprofen) is planned with retrosynthesis. The target is
analyzeduntilaset ofbuildingblocksis foundthatwe canbuyor wealreadyknowhowto make
(molecules marked red). Double arrows indicate a “reverse” retrosynthetic analysis step. b) The
transformation/productionrules(actions)toanalysethemolecules(state)c)Afterasynthesisroute
hasbeenfound,wecanrunthereactionsinthelabtomakethetarget.Singlearrowsindicatechemi-
calreactions,chemicalsoveranarrowareaddedduringthecourseofthereaction.d)Translationof
thechemicalrepresentation(a)toasearchtree. e)Neuralnetworksareusedasapolicytoselectthe
bestactionsandaretrainedonessentiallythecompletechemicalliteraturesince1771.
apply to a molecule for single step analyses [Segler&Waller (2017a)]. In this work, we seek to
transfer the other lessons learned from the last decade of computer Go, culminating in AlphaGo,
tothefullchemicalsynthesisplanningproblembycombiningourneuralnetworksaspolicieswith
MonteCarloTreeSearch(MCTS)[Silveretal.(2016);Browneetal.(2012)]. Thecontributionof
thisworkisasfollows: 1)WeprovidethefirstanalysisofretrosynthesisasaMarkovDecisionPro-
cess, 2)introduceMCTStothesynthesisplanningproblemtoapproximatethevalueofstatesand
actions, and3) showhowDeepNeuralNetworkscan beused aspolicies, to reducethe branching
factor,andtoapproximatethesubgraphisomorphismproblem.
2 THEORY
2.1 MARKOV DECISION PROCESSES
AMarkovDecisionProcess(MDP)isdefinedbyastatespaceS,andanactionspaceA(s),defining
thelegalactionsinagivenstates ∈ S. Here,astates ∈ S isasetofmoleculess = {m ,m ,...}.
i j
The initial state s0 = {m0} containsthe targetmolecule m0. Actions are productionson graphs.
When applying a legal action a to a state s = {m ,m ,m }, it will produce a new state, e.g.
l i a b c
s ={m ,m ,m }.ThetransitionfunctionT(s,a,s′)=P(s′|s,a)issettobedeterministicinthis
j d b c
workforsimplicity.Therewardfunctionr(s)returns1ifthestateisterminalandsolved(seebelow),
-1ifthestateisterminalandunsolved,and0otherwise.Apolicyπ(a|s)isaprobabilitydistribution
overallactionsa ∈ A(s),givenstates. Inthiswork,weuseadeephighwaynetwork,introduced
i
bySrivastavaetal.(2015),withtheELUnonlinearitiesbyClevertetal.(2015)andasoftmaxoutput
layertoparametrizeπ(a|s). Itpredictstheprobabilitydistributionoveralltransformationrules.
2.2 CHEMISTRY
Moleculesaretreatedasmoleculargraphs,whicharevertex-labeledandedge-labeledgraphsm =
(A,B), with atoms A as vertices and bonds B as edges [Andersenetal. (2014)]. Retrosynthetic
disconnectionrulesare productionsongraphsp : G → M, whereM is the space of multisets of
moleculesinG. Mostrulesareunaryandbinary,thatmeanstheytransformamoleculeintooneor
twonewgraphs.Eachactionisaproductionappliedtooneofthemoleculesminstates. Therules
are extracted automatically from the literature following established procedures [Segler&Waller
(2017b)]. GivenasetofbuildingblocksR, theretrosyntheticanalysisstate s issolvedif∀m ∈
k i
sk :mi ∼=rj ∈R,thatisallmoleculesmiinsk areisomorphictoabuildingblockrj ∈R.
2
Workshoptrack-ICLR2017
2.3 MONTECARLO TREESEARCH (MCTS)
MCTScombinestreesearchwithrandomsampling[Browneetal.(2012)]. Weuseourpolicynet-
workinthreeofthefourMCTSsteps:Selection,Expansion,andRollout.Withinthesearchtree,we
employEq.(1),avariationoftheUCTfunctionsimilartoAlphaGo’s,todeterminethebestchildv′
ofanodev,whereP(a)istheprobabilityassignedtothisactionbythepolicynetwork,N(v)the
numberofvisitsandQ(v)theaccumulatedrewardatnodev,andctheexplorationconstantsetto3.
Q(v′) N(v)
argmax +cP(a) (1)
v′∈childrenofv N(v′) 1p+N(v′)!
The tree policy is applied until a leaf node is found, which is expanded by means of the policy
network. Ourpolicynetworkhasatop1accuracyof0.62andatop50(of17370)accuracyof0.98.
Thisallowstokilltwobirdswithonestone: First, toreducethebranchingfactor,we onlyexpand
thetop50mostpromisingactions,andnotallpossibleones(≈200).Second,byevaluatingonly50
actions,wehavetosolvethesubgraphisomorphismproblem,whichdeterminesifthecorresponding
rulecan beappliedto a moleculeandyieldsthe nextmolecule(s),only50times insteadof17370
timesforallrules. Duringrollout,wesampleactionsfromthepolicynetworkuntiltheproblemis
solvedoramaximaldepthof20isreached.
We compareour algorithmto the state ofthe artsearch method,whichis Best-First Search (BFS)
withhand-codedheuristics,asdescribedbySzymkuc´etal.(2016). InBFS,eachbranchisaddedto
apriorityqueue,whichissortedbycost. Additionally,weperformBFSwiththecostforabranchb
calculatedwiththepolicynetworkasf(b)= s∈b(1−P(a |s )).
i=0 i i
P
3 RESULTS AND DISCUSSION
For testing, we randomlyselected 40 drug-likemoleculesgeneratedwith the neuralchemicallan-
guagemodelproposedbySegleretal.(2017). Thesetargetmoleculeswereanalyzedwiththediffer-
entalgorithms,tofindasynthesisroutetoknownbuildingblocks.Thealgorithmsreceivedabudget
of 2 h walltime to propose retrosynthetic paths for the 40 different targets. BFS was allowed to
exploreupto9000branches,MCTSperformed9000simulations. Allcalculationswereperformed
onalate2013MacBookProonCPU.
BFSwithheuristiccostisoutperformedbothintermsoftimeandaccuracybytheBFSandMCTS
with the deep neural network (DNN) policy. BFS+DNN is two times faster than MCTS+DNN,
however,MCTS+DNNisabletosolvemoreproblems.
Table1:ExperimentalResults
Entry Search Policy/Cost solved/% timedout/% meantimepermol./s
1 BFS Hand-codedHeuristic 22.5 70 600
2 BFS HighwayNetwork 87.5 0 29
3 MCTS HighwayNetwork 95 0 68
4 CONCLUSION
Inthiswork, we providedpreliminaryevidencethatcombiningdeepneuralnetworkpolicieswith
BFS and MCTS can also be used for planning in a highly challenging and important non-game
domain, namelychemicalsynthesisplanning. Our system outperformsthe currentstate of the art,
BFS with hand-codedheuristics. Largerscale experimentsare currentlyongoing. In future work,
weseektoexplorelearningvaluefunctionstoperformtruncatedsearch,learningbetterpolicies,and
exploittranspositions.Anotherinterestingavenueistousemorepowerful,butsloweralgorithmsto
calculatevaluesfortransformations,forexampleourrecentlyproposedmodelofchemicalreasoning
[Segler&Waller(2017b)].Weenvisionthatouralgorithmwillbecomeavalueabletoolforsynthetic
chemistsinindustryandacademia.
3
Workshoptrack-ICLR2017
ACKNOWLEDGMENTS
M.S. and M.P.W. thank RELX Intellectual Properties for the reaction dataset and Deutsche
Forschungsgemeinschaft(SFB858)forfunding.
REFERENCES
Jakob L Andersen, Christoph Flamm, Daniel Merkle, and Peter F Stadler. Generic strategies for
chemicalspaceexploration. Int.J.Comp.Bio.DrugDes.,7(2-3):225–258,2014.
CameronBBrowne,EdwardPowley,DanielWhitehouse,SimonMLucas,PeterICowling,Philipp
Rohlfshagen,StephenTavener,DiegoPerez,SpyridonSamothrakis,andSimonColton. Asurvey
ofmontecarlotreesearchmethods. IEEETransactionsonComputationalIntelligenceandAIin
games,4(1):1–43,2012.
Djork-Arne´ Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network
learningbyexponentiallinearunits(elus). arXivpreprintarXiv:1511.07289,2015.
EJCoreyandXMCheng. TheLogicofChemicalSynthesis. Wiley,NewYork,1989.
ChrisJMaddison,AjaHuang,IlyaSutskever,andDavidSilver. Moveevaluationingousingdeep
convolutionalneuralnetworks. arXivpreprintarXiv:1412.6564,2014.
Reaxys. ReaxysisaregisteredtrademarkofRELXIntellectualPropertiesSA,usedunderlicense,
2017. URLhttp://www.reaxys.com.
Marwin HS Segler and Mark P Waller. Neural-symbolicmachine-learningfor retrosynthesisand
reactionprediction. Chemistry–AEuropeanJournal,DOI:10.1002/chem.201605499,2017a.
MarwinHSSeglerandMarkPWaller.Modellingchemicalreasoningtopredictandinventreactions.
Chemistry–AEuropeanJournal,DOI:10.1002/chem.201604556,2017b.
Marwin HS Segler, Thierry Kogej, Christian Tyrchan, and Mark P Waller. Generating fo-
cussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint
arXiv:1701.01329,2017.
DavidSilver,AjaHuang,ChrisJMaddison,ArthurGuez,LaurentSifre,GeorgeVanDenDriessche,
JulianSchrittwieser,IoannisAntonoglou,VedaPanneershelvam,MarcLanctot,etal. Mastering
thegameofgowithdeepneuralnetworksandtreesearch. Nature,529(7587):484–489,2016.
Rupesh K Srivastava, Klaus Greff, and Ju¨rgen Schmidhuber. Training very deep networks. In
Advancesinneuralinformationprocessingsystems,pp.2377–2385,2015.
Sara Szymkuc´, Ewa P Gajewska, Tomasz Klucznik, Karol Molga, Piotr Dittwald, MichałStartek,
MichałBajczyk,andBartoszAGrzybowski. Computer-assistedsyntheticplanning: Theendof
thebeginning. AngewandteChemieInternationalEdition,55(20):5904–5937,2016.
MatthewHTodd. Computer-aidedorganicsynthesis. ChemSoc.Rev.,34(3):247–266,2005.
4