leveraging the value of documents content plexus for agile ......label inn/unn tradename drugcode...
TRANSCRIPT
-
Matthias Negri , PhDScientific Information CenterResearch Networking, Boehringer Ingelheim Pharma GmbH & Co KG
ChemAxon UGM, Budapest 2016 24. May 2016
Leveraging the value of documents content –PLEXUS for agile dissemination of results
-
Content
1. Data in Documents – beyond chemistry
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else..
5. Deployment of results – PLEXUS
6. Examples
2Negri M, ChemAxon UGM 2016
-
The manifold ways of Chemistry: Names …
InChI=1S/C22H30N6O4S/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29)
InChIKey=BNRNXUUZRGQAQC-UHFFFAOYSA-N
CCCC1=NN(C)C2=C1N=C(NC2=O)C1=CC(=CC=C1OCC)S(=O)(=O)N1CCN(C)CC1
LabelINN/UNNTradeNameDrugCodeDrugNameCompanyCodeDBcode1, DBcode2,DBcode3, ….CDB codeDevelop.CodeProductCode…
IDMP code
バイアグラ
偉哥
Viagra
Sildenafil
5-{2-ethoxy-5-[(4-methylpiperazin-1-yl)sulfonyl]phenyl}-1-methyl-3-propyl-1H,6H,7H-pyrazolo[4,3-d]pyrimidin-7-one Compound 3
3Negri M, ChemAxon UGM 2016
-
IUPAC:1‐methyl‐7‐(1‐methyl‐1H‐pyrazol‐4‐yl)‐5‐[4‐(trifluoromethoxy)phenyl]‐1H,4H,5H‐imidazo[4,5‐c]pyridin‐4‐one
text
molec. attachments (MOL,SDF, CDX)
table
table
image
example/cmpd nr
example/cmpd nr
The manifold ways of Chemistry: …appearance..
4Negri M, ChemAxon UGM 2016
-
Drug indication,
Disease – condition,
Reaction types,
Mechanism of action,
Medicinal and off- targets,
Description,
contraindications,
side effects, AE,
DDI interactions,
drug group/type/classification,
sampling time per drug dose,
dosage history,
Bibliographic “novelty check”,
patent landscape,
safety
companies
Chemistry as linking node for all …data
Toxicity
The manifold ways of Chemistry: …data
PK/PD
Bioactivity
Other Data
and beyond Chemistry… PLENTY of DATA !
5Negri M, ChemAxon UGM 2016
Phys-Chem Properties:- Experimental, calculated- Internal vs external
-
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else.. can we do?
5. PLEXUS – use cases
6Negri M, ChemAxon UGM 2016
-
Tools/technologies:The interplay
1. Pipelining - KNIME/XPATH
2. Chemical recognition - ChemAxon KNIME nodes + Command line tools
3. Text/data-mining – Linguamatics I2E
4. Optical Structure Recognition – Keymodule CLiDE
5. Visualization – ChemCurator and PLEXUS
7Negri M, ChemAxon UGM 2016
-
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else.. can we do?
5. PLEXUS – use cases
8Negri M, ChemAxon UGM 2016
-
SLOWER & memory intensive vs BUT Higher Quality, More Control & IUPAC-enriched XML
FASTER vs LESS informative/flexible
INPUT OCR TABLE
I2E API KNIME – Batch indexing, text-mining and (relational) data retrieval
GET
9
Current (past) experiences Patent Curation Workflow - update
Negri M, ChemAxon UGM 2016
-
Patent Curation WorkflowVisualize data-/textmining results
SDF file imported into ChemCC project + automatic mapping to existing chemistry
Tables are exported as Excel Sheets or as SDF files
10Negri M, ChemAxon UGM 2016
-
1. Still NO full automation BUT: using KNIME’s flexibility - more workflows
2. Time & Computational Resources 8 CPU notebook + Server (needed in
particular for OCR correction routines)
3. Novelty checking: compare Preferred IUPAC vs Traditional names (incl. common)
4. Improved handling of OCR
11
Patent Curation Workflow - update
Negri M, ChemAxon UGM 2016
-
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. Patents worked nice.. What else now?
5. PLEXUS – use cases
12Negri M, ChemAxon UGM 2016
-
1. Extraction of chemical reactions from PDFs
2. External databases – combine structured and unstructured (=TEXT) search
3. Internal Documents – make more out of Docx files
PLEXUSWhat’s next.. Visualization, Search & Share
Negri M, ChemAxon UGM 2016 13
-
1. Easy Search for chemical compounds or reaction in own PDF-collections
“Where is that reaction?“
2. Share your experience - Leverage in house synthetic knowledge
“Does this reaction also work when using a diverse reagent?“
“Which yield was achieved in house for that reaction?“
Extraction of chemical reactions from PDFs:
PLEXUS 1. Extract&Search for chemical reactions in PDFs
Negri M, ChemAxon UGM 2016 15
-
- Chemistry recognition (n2s/d2s, OSR)- linguistic reaction pattern recognition -
annotation with BRAT - Reaction extraction – splitting into components
Mrv file PLEXUSBrowser
PDF collection(s)
PLEXUS 1. Extract&Search for chemical reactions in PDFs
by
Negri M, ChemAxon UGM 2016 16
-
PLEXUS 1. Extract&Search for chemical reactions in PDFs
BRAT - capture the essence (role, anaphora, etc) of chemical reactions in text
Negri M, ChemAxon UGM 2016 17
-
Visualize/Design views for selected content – search & export results
PLEXUS 1. Extract&Search for chemical reactions in PDFs
Negri M, ChemAxon UGM 2016 18
-
19
PLEXUS 2. External databases – DrugBank
1. “classic” DB search (incl. chemical search) + search in “unstructured” text-boxes
2. Upload of content (eg. DrugBank) as raw XML/xls/csv or as pre-
processed/enriched nformation
3. Map DBs via IJC and display selected one2one/many-relations via PLEXUS
4. Custom Views for the various “customers“ within a company
Exploit Database-Searches beyond predefined fields:
Negri M, ChemAxon UGM 2016
-
PLEXUS 2. External databases – DrugBank
Substructure search
Text-based search
Negri M, ChemAxon UGM 2016 20
-
PLEXUS 3. Internal Documents – Docx/Doc
1. Search over 1000s of documents
2. Infer chemical meaning to Doc/Docx files
3. Extract&store bits of information company-wide repository for phys.-chem. or
experimental data
4. Outlook: combine “internal“ and external data
Make out more from static Word file collections (Dir:\file1, file2..)
Negri M, ChemAxon UGM 2016 21
-
IJCDesign
Form View
Text/datamining
Indexing/annotationFree text & tables
molconvert/reaction splitChemistry recognition (text + IMG/skc-Files)
join/mapCombine both outputs
PLEXUSVisualizeSearch
PLEXUS 3. Internal Documents – Docx/Doc
Negri M, ChemAxon UGM 2016 22
-
PLEXUS 3. Internal Documents – Docx/Doc
Negri M, ChemAxon UGM 2016 23
-
PLEXUS 3. Internal Documents – Docx/Doc
Different Tabs improved overview
Negri M, ChemAxon UGM 2016 24
-
- Version incompatibilities, stability issues, Java 7 vs Java 8
- If empty fields annoying error messages
PLEXUS - weak-points, limitations
Negri M, ChemAxon UGM 2016 25
-
- Only ONE chemical field – limited search options not possible to search for product and reagent in a “chemical” way
PLEXUS - weak-points, limitations
Negri M, ChemAxon UGM 2016 26
-
PLEXUS - weak-points, limitations
- Limited options - capabilities of IJC are not reflected in PLEXUS
Highlight search terms
HTML representation
IJC snapshot
Negri M, ChemAxon UGM 2016 27
-
Thank You
28
Acknowledgements
Lutz Weber
Anett Plüschel
Ulf Laube
Matthias Irmer
S.I.C. group
MedChem/ChemDev
H. Schmid
M. Santagostino
D. KirbergNegri M, ChemAxon UGM 2016