from data to foresight - brown university€¦ · the road from data to foresight is long...

10
1 © 2011 IBM Corporation From Data to Foresight: Laura Haas, IBM Fellow IBM Research - Almaden Leveraging Data and Analytics for Materials Research

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

1 © 2011 IBM Corporation

From Data to Foresight:

Laura Haas, IBM FellowIBM Research - Almaden

Leveraging Data and Analytics

for Materials Research

Page 2: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

2 © 2011 IBM Corporation

The road from data to foresight is long

� Must acquire, integrate, enhance and align

� Must deal with missing and incomplete data

� Must store, protect, and manage

� Must create models and other analytics and test them

� Must run these analyses efficiently over large data volumes

� Must understand and share results

� Requires significant (and expensive) EXPERTISE in data management,

systems, analytics, and the domain

� Takes TIME

?

How can I

reduce my

?

Consumer

Reports

RAINFALL

ERROR

RAINFALL

ERROR

SATURATION &

SURFACE Runoff

OVERLAND

ROUTING

UPDATE

STATE

UPDATE

STATE

UPDATE

STATE

UPDATE

STATESOLVE

STATE EQUATIONS

SOLVE

STATE EQUATIONS

SOLVE

STATE EQUATIONS

SOLVE

STATE EQUATIONS

PERCOLATIONPERCOLATION

MISCELLANEOUS

FLUXES

MISCELLANEOUS

FLUXES

MISCELLANEOUS

FLUXES

MISCELLANEOUS

FLUXES

MISCELLANEOUS

FLUXES

UPPER LAYER

EVAPORATION

UPPER LAYER

EVAPORATION

UPPER LAYER

EVAPORATION

LOWER Layer

EVAPORATION

LOWER Layer

EVAPORATION

LOWER Layer

EVAPORATION

INTERFLOW

BASE FLOWBASE FLOWBASE FLOW

SATURATION &

SURFACERunoff

PERCOLATION

INTERFLOW

SOLVE

STATE EQUATIONS

LOWER LAYER

EVAPORATION

UPPER LAYER

EVAPORATION

Miscfluxes

UPDATE

STATE

Note: in addition to dependencies shown, most flux calculations are dependent on values of state variables at the previous timestep

Instantaneous Runoff

Routed Runoff

Total Water:Upper Layer, Lower Layer

OUTPUT

Legend: Flux computations

State computations

Inputs and outputs

SATURATION &

SURFACE RUNOFF

Upper Layer

Evaporation

Lower Layer

Evaporation

EffectivePrecipitation

BASE FLOWBASE FLOW

OVERLAND

ROUTING

Interflow

Baseflow

Saturated

AreaSurface Runoff

Observed Precipitation

Potential Evapo-

Transpiration

Percolation

MISCELLANEOUS

FLUXES

Percolation

INPUT

Page 3: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

3 © 2011 IBM Corporation

The 4 V’s of data

Volume Velocity Veracity*Variety

Data at Rest

Terabytes to

exabytes of existing

data to process

Data in Motion

Streaming data,

milliseconds to

seconds to respond

Data in Many

Forms

Structured,

unstructured, text,

multimedia

Data in Doubt

Uncertainty due to

data inconsistency

& incompleteness,

ambiguities, latency,

deception, model

approximations

* Truthfulness, accuracy or precision, correctness

Page 4: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

4 © 2011 IBM Corporation4

Valuable new insights are hidden in this wealth of data!

Identify criminals and threats

from disparate video, audio,

and data feeds

Make risk decisions based on

real-time transactional data

Predict weather patterns to plan

optimal wind turbine usage, and

optimize capital expenditure on

asset placement

Detect life-threatening

conditions at hospitals in

time to intervene

Discover and optimize new

materials by mining data in the

patents and literature

Page 5: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

5 © 2011 IBM Corporation

Fortunately, new platforms can unlock the value of data

BI /

Reporting

BI /

Reporting

Exploration /

Visualization

Functional

App

Industry

App

Predictive

Analytics

Content

Analytics

Analytic Applications

IBM Big Data Platform

Systems

Management

Application

Development

Visualization

& Discovery

Accelerators

Information Integration & Governance

Hadoop

System

Stream

Computing

Data

Warehouse

New analytic applications drive the

requirements for a big data platform

• Integrate and manage the full

variety, velocity and volume of data

• Apply advanced analytics to

information in its native form

• Visualize all available data for ad-

hoc analysis

• Develop new analytic applications

• Optimize and control scheduling of

many simultaneous analyses

• Protect data and applications from

accidents, sabotage, and theft

Page 6: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

6 © 2011 IBM Corporation

Outcome-based medicine vision: Leverage public and private content, rich analytics to improve treatment outcomes

Research & Development and

Intellectual PropertyTarget Identification and Validation

Lead Discovery and Optimization

Safety and Efficacy

Genomics

Proteomics

Metalobomics

Chemical and

Biological Extraction,

Profiling, Analytics,

And Reasoning

Clinical Decision SupportPatient Similarity and Segmentation

Patient Cohorts for Clinical Support

Clinical Genomics Analysis

Comparative Effectiveness Research

Predictive Modeling of Outcome

Disease Progression Analysis

Treatment Cost Analysis

Temporal Analysis

Patient experience

and social

community supportPatient first hand

experiences

Social community

development and support

Patents Pre-clinical

Clinical Trials

Scientific

Literature

Safety

DMPK

FormulationClaims Data

Electronic

Medical

Records

Ontologies

Pathways

Curated Data

Web

Social Media

High

Throughput

Screening

Target

Selection

Candidate

Selection

Development

Selection

Target

Identification

Lead

DiscoveryPreclinical

Development

Clinical

I II IIIlPatient

Experience

Launch Patient

Outcome

Medical

Care

Key Analytics Capabilities: BI, Text analytics, NLP, Network Analysis, Relationship Discovery, ML, Modeling, …

Page 7: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

7 © 2011 IBM Corporation

An Example: Leveraging data to accelerate life sciences R&D

► R&D Find white space and gain insight into complex chemical and biological patents; Gain early insights into given target-

compound match from past patents for better research target & compound selection decisions

► Legal Detect IP infringement earlier and increase the quality of patent filings

► Corporate Strategy / Business Dev Identify collaboration and acquisition targets for greater research value and

effectiveness and find patent in- and out licensing candidates for efficient management and monetization of IP

► Valuable insights into competitive landscape, white space, and IP portfolio

► High quality chemical extractions available hours after patents are available from patent authorities

► Previously unobtainable insights at the scientists’ fingertips with the touch of a button

► Fast and easy search and analysis drastically reducing search time from weeks and months to just minutes

The Benefits

Highly volatile, increasingly complex environment

Traditional R&D is not delivering

New approaches are needed

Collaborative R&D models The new normal requiring

open platforms, clear boundaries and protection

Agile responses Vital to drive fast adaptation to changing

competitive IP landscape including, adjustments to strategy,

portfolio investments and partnerships

Effective IP portfolio management Delivering key value

for out-licensing and monetizing of non-core IP

Strategic ecosystem development Growth and

competitive differentiation through aggressive collaboration,

early identification of acquisition and recruitment targets

The Situation

IBM BAO strategic IP insight platform (SIIP)

A unique and powerful

data and analytics offering

Aggregates and processes 30M+ patents and scientific

literature from around the globe

Automatically extracts chemical and biological entities –

200M+ chemical compound instances to date

Generates chemical and biological entity profiles

Searches and analyzes using natural language-based

inputs for key relationship discovery and IP insights

Reasoning about causality of drug, diseases, targets, and

efficacy and side effects

Integrates and enhances existing data and applications

The Solution

Page 8: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

8 © 2011 IBM Corporation

A Smart Entity Profiling, Analytics and Reasoning Methodology

Medicine

Disease Patients

IP- Legal status

- Assignee

- Foreign filings

- Expiration Date

- . . .

Drug- Activity

- Half life

- Protein Binding

- . . .

Physical- Computational

- Molecular Weight

- MF, Bp, Mp

- . . .Spectral- IR

- NMR

- Mass Spectra

- . . .

Toxocity- Clinical Trials

- Pre-Clinical

- . . .

Pathways- Metabolic

- Genetic

- Environmental

- Cellular

- Organism

- . . .

Screening- Activity

- . . .

Genetic-. . .

Organisms- Organism

- Organ

- Cell

- Tissue

- . . .

Life styles-. . .

Reactions- Enzymes

- . . .

Patents

Literature

Experimental

HTS

Medical

Records

Clinical

Business

Medical

History-. . .

Social

•An integrated framework leveraging

broad set of data, and many types of

analytics:

• Hypothesis generation

• Entity extraction and

profiling

• Relationship discovery

and analytics

• Summarization

• Reasoning

• Scoring and ranking

• Predictive modeling

•Key steps:

• Extract key entities

• Combine information

from multiple sources

• Discover relationships

among entities

• Reason about

relationships

Medical

Records-. . .

Page 9: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

9 © 2011 IBM Corporation

Information and Governance for Big Data

Leverage private/public clouds to share vs keep proprietary as appropriate

Page 10: From Data to Foresight - Brown University€¦ · The road from data to foresight is long SURFACEMust acquire, integrate, enhance and align INTERFLOWMust deal with missing and incomplete

10 © 2011 IBM Corporation

Summary

� There is much to be gained from leveraging available data and content

– Accelerate discovery

– Avoid repeating work

� Unlocking the value buried in there is difficult

– 4 V’s: Volume, Velocity, Variety, Veracity

– A long process requiring many types of expertise

� There are powerful platforms and tools that can help

– Aid development of type-specific analytics

– Enable fast and timely processing of large diverse data sets

� Sharing, with appropriate data governance, can accelerate discovery

– Controls for the entire data lifecycle

– Many industry groups are finding leverage from shared investments