molecular de novo design through deep reinforcement …
TRANSCRIPT
![Page 1: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/1.jpg)
Molecular De Novo Design through Deep Reinforcement Learning
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen Hit Discovery, Discovery Sciences, AstraZeneca R&D Gothenburg 05 September 2017
![Page 2: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/2.jpg)
Searching Chemical Space
2
Goal:
Find structures with certain properties
We need two things:
A way to generate structures
A way to rank structures – scoring function
Ideally, the scoring function should influence the way we generate structures
![Page 3: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/3.jpg)
Scoring function
3
S:𝑥 →𝑦∈𝑹
X is the chemical structure that we map to a scalar value y. For example:
ECFP6 PActive = 0.83
Generate fingerprint SVM Classifier for target activity
![Page 4: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/4.jpg)
But how do we generate structures?
4
Build structures atom by atom?
Combine fragments through reaction rules?
![Page 5: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/5.jpg)
Overcoming the curse of dimensionality
5
![Page 6: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/6.jpg)
But how do we generate structures?
6
Atom by atom
Learn transition probabilities
The model should represent a probability distribution over chemical space
P(C-C)?
![Page 7: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/7.jpg)
Natural language processing and SMILES
7
• How to represent molecular structure?
• Can borrow concepts from language processing and apply to SMILES
• Conditional probability distributions given context
• 𝑃𝑔𝑟𝑒𝑒𝑛 𝑖𝑠, 𝑔𝑟𝑎𝑠𝑠, 𝑇ℎ𝑒
• 𝑃𝑂 =,𝐶, 𝐶
The grass is ?
C C = ?
![Page 8: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/8.jpg)
Natural language processing and SMILES
8
![Page 9: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/9.jpg)
Neural networks
9
A neural network is a function approximator
Powerful due to flexibility
![Page 10: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/10.jpg)
Recurrent Neural Networks
10
![Page 11: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/11.jpg)
The prior RNN
11
• Restrain structures to 10 to 50 heavy atoms
• And elements H, B, C, N, O, F, P, S, Cl, Br, I
• 1.5 million SMILES from ChEMBL
• Canonicalized using RDKit
Training SMILES
Pretrained Network
Train RNN
ChEMBL
Filter
Select 1.5 million
![Page 12: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/12.jpg)
The generative process
12
![Page 13: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/13.jpg)
Structures generated by the Prior
13
Randomly selected
![Page 14: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/14.jpg)
Searching Chemical Space
14
We need two things:
A way to generate structures
A way to rank structures – scoring function
Ideally, the scoring function should influence the way we generate structures
![Page 15: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/15.jpg)
Augmented Likelihood
15
𝐴𝑢𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑=𝑃𝑟𝑖𝑜𝑟 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑+𝜎 ×𝑆𝑐𝑜𝑟𝑒
Prior Agent
![Page 16: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/16.jpg)
Reinforcement learning
16
• Learning from doing
Action Reward Update behaviour
Design molecule Active?
Good DMPK? Synthetically accessible?
Make more like this? Make something else instead?
Agent
![Page 17: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/17.jpg)
The Agent
17
![Page 18: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/18.jpg)
Results
![Page 19: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/19.jpg)
Example 1 – Structure similarity
19
• Learn to generate analogues
• J is the Tanimoto similarity
• FCFP4 fingerprints
• k is the upper limit of J
![Page 20: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/20.jpg)
Generating only Celecoxib
20
• We chose Celecoxib as a query structure
• First explored if we could recover Celecoxib itself
• 𝑘 = 1 and 𝜎=15
• After 200 training steps, the model generates only Celecoxib
• Even when everything with 𝐽 >0.5 was removed from the Prior (“Reduced Prior”), Celecoxib was recovered
![Page 21: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/21.jpg)
Generating analogues to Celecoxib
21
• But we want analogues
• 𝑘=0.7 and 𝜎=12
• For canonical Prior learns well
• Reduced Prior requires a higher 𝜎=15 to offset the lower prior likelihood
Reduced Prior
![Page 22: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/22.jpg)
Example 1 – Structure similarity
22
![Page 23: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/23.jpg)
23
![Page 24: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/24.jpg)
Example 2 – Biological target activity
24
• Using activity model (SVM model) as the scoring function
• Uncalibrated probability of being active 𝑃↓𝑎𝑐𝑡𝑖𝑣𝑒
𝑆𝑐𝑜𝑟𝑒=−1+2 ×𝑃↓𝑎𝑐𝑡𝑖𝑣𝑒
• Train for 3000 steps with 𝜎=7
• Also remove all actives from ChEMBL and train a reduced Prior
• Then train reduced Agent
![Page 25: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/25.jpg)
Enrichment of predicted actives
25
• Enrichment of 250 times for both Agents
• Withholding actives affects similarity to actives, but not fraction of predicted actives
![Page 26: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/26.jpg)
Structures generated by reduced Agent
26
![Page 27: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/27.jpg)
Conclusion
![Page 28: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/28.jpg)
Acknowledgements
28
• Thanks to Hongming, Thomas, and Ola
• Thanks to Thierry Kogej and Christian Tyrchan for valuable discussion
![Page 29: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/29.jpg)
Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
29
![Page 30: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/30.jpg)
Backup slides
![Page 31: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/31.jpg)
Tokenization of SMILES
31
• Tokenize combinations of characters like “Cl” or “[nH]”
• Represent the characters as one-hot vectors
![Page 32: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/32.jpg)
Augmented Likelihood
32
• Do not want to forget the prior probability distributions
• Instead modify it using a scoring function
• Scoring function rates desirability of a molecule (e.g. bioactivity)
𝐴𝑢𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑=𝑃𝑟𝑖𝑜𝑟 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑+𝜎 ×𝑆𝑐𝑜𝑟𝑒
• Likelihood is here reported as the natural logarithm
• Sigma is a constant signifying the tradeoff between score and prior likelihood
![Page 33: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/33.jpg)
Augmented Likelihood
33
𝐴𝑢𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑=𝑃𝑟𝑖𝑜𝑟 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑+𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ×𝑆𝑐𝑜𝑟𝑒
• For example:
– 𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒 = 𝐸𝑡ℎ𝑎𝑛𝑜𝑙
– 𝑆𝑐𝑜𝑟𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = “−1 𝑖𝑓 𝑎𝑟𝑜𝑚𝑎𝑡𝑖𝑐 𝑒𝑙𝑠𝑒 1”
– 𝑆𝑐𝑜𝑟𝑒 = 1
– 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝜎 = 3
– 𝑃𝑟𝑖𝑜𝑟 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = −15
Prior P Score Augmented P constant
−15 1 −15 + 3 × 1 = −12 3
![Page 34: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/34.jpg)
The prior RNN
34
• 94% of the generated sequences are valid SMILES
• 90% are novel – not part of the 1.5 million training SMILES
• Most common error is not closing an opened ring or branch
Training SMILES
Pretrained Network
Train RNN
ChEMBL
Filter
Select 1.5 million
![Page 35: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/35.jpg)
Example 1 – Structure similarity
35
![Page 36: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/36.jpg)
Example 1 – Structure similarity
36
![Page 37: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/37.jpg)
Example 2 – SVM model
37
• DRD2 activity data from ExCAPE-DB
• 7218 actives and 340 000 inactive compounds
• Actives were clustered using the Butina algorithm (ECFP6 cutoff of 0.4)
• Clusters were split into train (4/6), validation(1/6), and test(1/6) sets
• 4526, 1287, 1405 actives respectively
![Page 38: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/38.jpg)
Example 2 – SVM model
38
• Used a SVM classifier in RDKit with C=27 and gamma=2-6
![Page 39: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/39.jpg)
Example 2 – Probability distributions
39
• Reduced Prior and Agent
• Small changes overall
• But a large change at even one step may significantly change the structures generated
![Page 40: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/40.jpg)
Example 3 – Learn to avoid sulphur
40
• Toy example – learn to generate structures that do not contain sulphur
• Train for 1000 steps with 𝜎=2 and a batch size of 128
![Page 41: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/41.jpg)
Example 3 – Learn to avoid sulphur
41
![Page 42: Molecular De Novo Design through Deep Reinforcement …](https://reader030.vdocuments.net/reader030/viewer/2022012608/619bd2c98498e73ab934f12a/html5/thumbnails/42.jpg)
Example 3 – Learn to avoid sulphur
42