leveraging the value of documents content plexus for agile ......label inn/unn tradename drugcode...

27
Matthias Negri , PhD Scientific Information Center Research Networking, Boehringer Ingelheim Pharma GmbH & Co KG ChemAxon UGM, Budapest 2016 24. May 2016 Leveraging the value of documents content PLEXUS for agile dissemination of results

Upload: others

Post on 13-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Matthias Negri , PhDScientific Information CenterResearch Networking, Boehringer Ingelheim Pharma GmbH & Co KG

    ChemAxon UGM, Budapest 2016 24. May 2016

    Leveraging the value of documents content –PLEXUS for agile dissemination of results

  • Content

    1. Data in Documents – beyond chemistry

    2. Tools/technologies

    3. Current (past) experiences - Patent Curation Workflow

    4. What else..

    5. Deployment of results – PLEXUS

    6. Examples

    2Negri M, ChemAxon UGM 2016

  • The manifold ways of Chemistry: Names …

    InChI=1S/C22H30N6O4S/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29)

    InChIKey=BNRNXUUZRGQAQC-UHFFFAOYSA-N

    CCCC1=NN(C)C2=C1N=C(NC2=O)C1=CC(=CC=C1OCC)S(=O)(=O)N1CCN(C)CC1

    LabelINN/UNNTradeNameDrugCodeDrugNameCompanyCodeDBcode1, DBcode2,DBcode3, ….CDB codeDevelop.CodeProductCode…

    IDMP code

    バイアグラ

    偉哥

    Viagra

    Sildenafil

    5-{2-ethoxy-5-[(4-methylpiperazin-1-yl)sulfonyl]phenyl}-1-methyl-3-propyl-1H,6H,7H-pyrazolo[4,3-d]pyrimidin-7-one Compound 3

    3Negri M, ChemAxon UGM 2016

  • IUPAC:1‐methyl‐7‐(1‐methyl‐1H‐pyrazol‐4‐yl)‐5‐[4‐(trifluoromethoxy)phenyl]‐1H,4H,5H‐imidazo[4,5‐c]pyridin‐4‐one

    text

    molec. attachments (MOL,SDF, CDX)

    table

    table

    image

    example/cmpd nr

    example/cmpd nr

    The manifold ways of Chemistry: …appearance..

    4Negri M, ChemAxon UGM 2016

  • Drug indication,

    Disease – condition,

    Reaction types,

    Mechanism of action,

    Medicinal and off- targets,

    Description,

    contraindications,

    side effects, AE,

    DDI interactions,

    drug group/type/classification,

    sampling time per drug dose,

    dosage history,

    Bibliographic “novelty check”,

    patent landscape,

    safety

    companies

    Chemistry as linking node for all …data

    Toxicity

    The manifold ways of Chemistry: …data

    PK/PD

    Bioactivity

    Other Data

    and beyond Chemistry… PLENTY of DATA !

    5Negri M, ChemAxon UGM 2016

    Phys-Chem Properties:- Experimental, calculated- Internal vs external

  • Content

    1. Data in Documents – chemistry & beyond

    2. Tools/technologies

    3. Current (past) experiences - Patent Curation Workflow

    4. What else.. can we do?

    5. PLEXUS – use cases

    6Negri M, ChemAxon UGM 2016

  • Tools/technologies:The interplay

    1. Pipelining - KNIME/XPATH

    2. Chemical recognition - ChemAxon KNIME nodes + Command line tools

    3. Text/data-mining – Linguamatics I2E

    4. Optical Structure Recognition – Keymodule CLiDE

    5. Visualization – ChemCurator and PLEXUS

    7Negri M, ChemAxon UGM 2016

  • Content

    1. Data in Documents – chemistry & beyond

    2. Tools/technologies

    3. Current (past) experiences - Patent Curation Workflow

    4. What else.. can we do?

    5. PLEXUS – use cases

    8Negri M, ChemAxon UGM 2016

  • SLOWER & memory intensive vs BUT Higher Quality, More Control & IUPAC-enriched XML

    FASTER vs LESS informative/flexible

    INPUT OCR TABLE

    I2E API KNIME – Batch indexing, text-mining and (relational) data retrieval

    GET

    9

    Current (past) experiences Patent Curation Workflow - update

    Negri M, ChemAxon UGM 2016

  • Patent Curation WorkflowVisualize data-/textmining results

    SDF file imported into ChemCC project + automatic mapping to existing chemistry

    Tables are exported as Excel Sheets or as SDF files

    10Negri M, ChemAxon UGM 2016

  • 1. Still NO full automation BUT: using KNIME’s flexibility - more workflows

    2. Time & Computational Resources 8 CPU notebook + Server (needed in

    particular for OCR correction routines)

    3. Novelty checking: compare Preferred IUPAC vs Traditional names (incl. common)

    4. Improved handling of OCR

    11

    Patent Curation Workflow - update

    Negri M, ChemAxon UGM 2016

  • Content

    1. Data in Documents – chemistry & beyond

    2. Tools/technologies

    3. Current (past) experiences - Patent Curation Workflow

    4. Patents worked nice.. What else now?

    5. PLEXUS – use cases

    12Negri M, ChemAxon UGM 2016

  • 1. Extraction of chemical reactions from PDFs

    2. External databases – combine structured and unstructured (=TEXT) search

    3. Internal Documents – make more out of Docx files

    PLEXUSWhat’s next.. Visualization, Search & Share

    Negri M, ChemAxon UGM 2016 13

  • 1. Easy Search for chemical compounds or reaction in own PDF-collections

    “Where is that reaction?“

    2. Share your experience - Leverage in house synthetic knowledge

    “Does this reaction also work when using a diverse reagent?“

    “Which yield was achieved in house for that reaction?“

    Extraction of chemical reactions from PDFs:

    PLEXUS 1. Extract&Search for chemical reactions in PDFs

    Negri M, ChemAxon UGM 2016 15

  • - Chemistry recognition (n2s/d2s, OSR)- linguistic reaction pattern recognition -

    annotation with BRAT - Reaction extraction – splitting into components

    Mrv file PLEXUSBrowser

    PDF collection(s)

    PLEXUS 1. Extract&Search for chemical reactions in PDFs

    by

    Negri M, ChemAxon UGM 2016 16

  • PLEXUS 1. Extract&Search for chemical reactions in PDFs

    BRAT - capture the essence (role, anaphora, etc) of chemical reactions in text

    Negri M, ChemAxon UGM 2016 17

  • Visualize/Design views for selected content – search & export results

    PLEXUS 1. Extract&Search for chemical reactions in PDFs

    Negri M, ChemAxon UGM 2016 18

  • 19

    PLEXUS 2. External databases – DrugBank

    1. “classic” DB search (incl. chemical search) + search in “unstructured” text-boxes

    2. Upload of content (eg. DrugBank) as raw XML/xls/csv or as pre-

    processed/enriched nformation

    3. Map DBs via IJC and display selected one2one/many-relations via PLEXUS

    4. Custom Views for the various “customers“ within a company

    Exploit Database-Searches beyond predefined fields:

    Negri M, ChemAxon UGM 2016

  • PLEXUS 2. External databases – DrugBank

    Substructure search

    Text-based search

    Negri M, ChemAxon UGM 2016 20

  • PLEXUS 3. Internal Documents – Docx/Doc

    1. Search over 1000s of documents

    2. Infer chemical meaning to Doc/Docx files

    3. Extract&store bits of information company-wide repository for phys.-chem. or

    experimental data

    4. Outlook: combine “internal“ and external data

    Make out more from static Word file collections (Dir:\file1, file2..)

    Negri M, ChemAxon UGM 2016 21

  • IJCDesign

    Form View

    Text/datamining

    Indexing/annotationFree text & tables

    molconvert/reaction splitChemistry recognition (text + IMG/skc-Files)

    join/mapCombine both outputs

    PLEXUSVisualizeSearch

    PLEXUS 3. Internal Documents – Docx/Doc

    Negri M, ChemAxon UGM 2016 22

  • PLEXUS 3. Internal Documents – Docx/Doc

    Negri M, ChemAxon UGM 2016 23

  • PLEXUS 3. Internal Documents – Docx/Doc

    Different Tabs improved overview

    Negri M, ChemAxon UGM 2016 24

  • - Version incompatibilities, stability issues, Java 7 vs Java 8

    - If empty fields annoying error messages

    PLEXUS - weak-points, limitations

    Negri M, ChemAxon UGM 2016 25

  • - Only ONE chemical field – limited search options not possible to search for product and reagent in a “chemical” way

    PLEXUS - weak-points, limitations

    Negri M, ChemAxon UGM 2016 26

  • PLEXUS - weak-points, limitations

    - Limited options - capabilities of IJC are not reflected in PLEXUS

    Highlight search terms

    HTML representation

    IJC snapshot

    Negri M, ChemAxon UGM 2016 27

  • Thank You

    28

    Acknowledgements

    Lutz Weber

    Anett Plüschel

    Ulf Laube

    Matthias Irmer

    S.I.C. group

    MedChem/ChemDev

    H. Schmid

    M. Santagostino

    D. KirbergNegri M, ChemAxon UGM 2016