© oellien, ihlenfeldt, engel, ertl c3c3 mmws 2002 interactive datamining of large-scale screening...

24
© Oellien, Ihlenfeldt, Engel, Ertl C 3 MMWS 2002 Interactive Datamining of Large-Scale Screening Datasets Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group University Stuttgart Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg

Post on 18-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Interactive Datamining of Large-Scale Screening Datasets

Klaus Engel, Thomas ErtlVisualization and Interactive Systems Group University Stuttgart

Frank Oellien, Wolf D. IhlenfeldtComputer-Chemie-Centrum University Erlangen-Nuremberg

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Chemical data

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

18000000

Merck Katalog

Synopsys PG

ACX

NCI DTP

ChemInform

Spresi

Beilstein

CAS

Current datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Multi-Variate and Multi-Dimensional Numeric Datasets Today

Change in chemical synthesis technology

• new technologies (HTS, combinatorial synthesis) experiments generate terabytes of data per year

• development of data mining and visualization tools could not keep pace

• most critical bottleneck in R&D today !

tools for interactive mining and information visualization are needed

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data

Standard applications• barchart, 2D and pseudo 3D

scatter plots, molecular spreadsheets• limited to small subsets• platform-dependent

Our goal: applications that are• simple to use• allow straightforward interpretation of results• generalized access to tabular numeric data• platform-independent

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

3D Tools for Interactive Information Visualization

Information Visualization Applications that uses 3D capabilities of modern clients

• Glyph-based InfVis approaches

• Volume-based InfVis approaches

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Glyph-based InfVis Tools

• 3 orthogonal axes

• color

• shape

• size

• transparency

• surface effects

• animation

• up to ~100 Glyphs

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet

Tool Panel(filters, selection

tools, details)

Java3DCanvas

ControlPanel

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Render Panel

3D Barchart3D Glyphs

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Tool Panel

Dynamic Filter Tools

Selection Tools

Detail Tools

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Control Panel

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Advantages of Volume-based InfVis Tools

Databases with millions of data points – Glyph-based InfVis approaches

• produce millions of geometricprimitives

• interactive visualization not possible

– Volume-based InfVis approaches • can handle large number of

data points• interactive visualization using

low-cost graphics hardware is possible

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes Reaction Database

• 100 most important FGs ~75% chemistry• 100 standard reactions• Limits of standard reactions• Functional Group Compatibility• Generating Rules

Goal: Analysis of the reaction space

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes - Reaction Optimization I

• Goal: Reaction Optimization: > 95% Yield

• 7 Dimensions:reagent, solvent, time, temperature,stoichiometry,reagent order,FG-compatibility

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes - Reaction Optimization II

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

FunctionalGroupCompatibilityCheck

ChemCodes - Reaction Planning

N

H H

H

O

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Example 2: NCI Anti-tumor / Anti-viral Database

• Initiated in April 1990 (modified 1994)• ~ 250.000 compounds• ~ 30.000 with anti-tumor screening data

Enhanced NCI Database Browser• > 30 different molecular properties• up to 23 3D conformers per compound

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Lead Compound Discovery II

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Lead Compound Discovery II

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Acknowledgment

• Prof. Johann GasteigerComputer-Chemie-CentrumUniversity of Erlangen-Nuremberg

• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive SystemsUniversity of Stuttgart

• Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.

• Marc NicklausLaboratory of Medicinal ChemistryNCI, NIH

• Deutsche Forschungsgemeinschaft