oracle life sciences platform and 10g preview charlie berger sr. director of product management,...

45

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Oracle Life Sciences Platform and 10g Preview

Charlie BergerSr. Director of Product Management, Life Sciences

and Data Mining

[email protected]

Oracle Corporation

Session id: 40263

Welcome to the Oracle Life Sciences User Group Meeting

Oracle HQBldg 350 Conference Center

Redwood Shores, CASeptember 10th, 2003

8:30 am-7:30 pm

Oracle Life Sciences Day & User Group Meeting Agenda 8:00-8:30 Breakfast8:30-8:45 Welcome8:45-9:45 Oracle's Platform for Life Sciences - New 10G Features Preview &

Solicitation Process for Features in Next ReleaseCharlie Berger, Oracle Corporation

9:45-10:30 New In Silico Drug Discovery Integrated DemoJoyce Peng, Oracle Corporation

10:30-10:50 Break 10:50-11:30 European Bioinformatics Institutes (EBI), Peter Stoehr

Managing Scientific Literature (Medline) and XML Data Within Oracle11:30-12:10 The Wellcome Trust Sanger Institute, Martin Widlake

Implementing a Terascale Data Store (20 TB)12:10-1:00 Lunch & Wish List Feature Post-it Notes1:00-1:40 Wyeth Research, Peter Smith

21 CFR PART 11 via Oracle Auditing at Wyeth

Oracle Life Sciences Day & User Group Meeting Agenda 1:40-2:20 Sequence Search Capabilities in the Database, Myriad Proteomics2:20-3:00 Johnson & Johnson, Richard Guida & Rajesh Shah

Building a Secure Infrastructure with Oracle in Life Sciences, J & J PKI and Secure Connectivity to Oracle

3:00-3:20 Break & Afternoon Refreshments3:20-4:00 Kyoto University, Japan, Susumu Goto

Integrating Biological Information and Pathways using Oracle,KEGG at Kyoto University

4:00-4:40 BioMed Central Limited, Matthew CockerillManaging Scientific Images with Oracle - Multimedia Database Improves the Bottom Line

4:40-5:20 Abbott Laboratories, Shon NaeymiradElectronic Records, 21 CFR Part 11 and Oracle 9i

5:20-5:30 Break5:30-6:30 ISV Lightening Rounds, Life Sciences ISV Partners6:30-7:30 ISV Reception and Demo Grounds

"My industry is going to become pretty boring soon –I don't believe you'll ever see this proliferation of informatics

companies or computer companies like you sawin the decade of the Nineties. The life sciences industry

is where the horizons are wide open. There'll be lots and lotsof companies born, lots of new products, lots of new science

at least for the next 50 years.

Because of that...we've decided to focus heavilyon the life sciences industry.”

-Larry Ellison, CEO, Oracle Corporation,Bio-IT World magazine, premier issue March 2002

Oracle’s Commitment

Life Sciences Value Chain

Discovery Discovery

Contract Contract ResearchResearch Organization Organization

HospitalHospital

Pharmaceutical Pharmaceutical Mfg. Plant Mfg. Plant

PharmacPharmacyy

DistributionDistribution

Development Development

Manufacturing, SalesManufacturing, Salesand Marketingand Marketing

PharmaceuticalPharmaceuticalCompany Company

RegulatoryRegulatoryAgencyAgency

Clinical

Clinical

Trials

Trials

Biotech /Biotech /PharmaceuticalPharmaceuticalResearch LabsResearch Labs

Public/Public/Private DataPrivate Data

Wet Lab

In Silico

SampleSampleDataData

BiomedicalBiomedicalFirm Firm

BiomedicalBiomedicalFirm Firm

PharmaceuticalPharmaceuticalCompany Company

Pre-Clinical

Pre-Clinical

Trials

Trials

Database ApplicationServer

DiscoveryDiscovery

Finance

HR Projects

Maintenance

Manufacture/Supply Chain Management

Manage all your dataManage all your data Run all your applicationsRun all your applications

Oracle’s Solutions for Life Sciences

DiscoveryDiscovery

Development& Clinical

Sales & Marketing

Years

Revenue

Identify and

Validate Targets

Identify and

Validate Leads

Pre- Clinical Trails

Clinical Trials

PatentExpiry

Competitionfrom Generics

ProductLaunch

Goal: Accelerate the Discovery Process

Source: Ernst & Young, Price Waterhouse

Costs

R & D Costs

Identify and

Validate Targets

Identify and

Validate Leads

Pre- Clinical Trails

Clinical Trials

R & D Costs20

Sales Revenue

15

Drug Discovery Economics 101Better Data Management Accelerates Discovery

Cell Nucleus

Chromosome

Protein

Graphics courtesy of the National Human Genome Research Institute

Gene (DNA)Gene (mRNA)

Organism

Life Sciences DiscoveryGenes and Proteins Run the Cell

aattggaagc aaatgacatc acagcaggtc agagaaaaag ggttgagcgg caggcacccagagtagtagg tctttggcat taggagcttg agcccagacg gccctagcag ggaccccagcgcccgagaga ccatgcagag gtcgcctctg gaaaaggcca gcgttgtctc caaactttttttcagctgga ccagaccaat tttgaggaaa ggatacagac agcgcctgga attgtcagacatataccaaa tcccttctgt tgattctgct gacaatctat ctgaaaaatt ggaaagagaatgggatagag agctggcttc aaagaaaaat cctaaactca ttaatgccct tcggcgatgttttttctgga gatttatgtt ctatggaatc tttttatatt taggggaagt caccaaagcagtacagcctc tcttactggg aagaatcata gcttcctatg acccggataa caaggaggaacgctctatcg cgatttatct aggcataggc ttatgccttc tctttattgt gaggacactgctcctacacc cagccatttt tggccttcat cacattggaa tgcagatgag aatagctatgtttagtttga tttataagaa gactttaaag ctgtcaagcc gtgttctaga taaaataagtattggacaac ttgttagtct cctttccaac aacctgaaca aatttgatga aggacttgcattggcacatt tcgtgtggat cgctcctttg caagtggcac tcctcatggg gctaatctgggagttgttac aggcgtctgc cttctgtgga cttggtttcc tgatagtcct tgccctttttcaggctgggc tagggagaat gatgatgaag tacagagatc agagagctgg gaagatcagtgaaagacttg tgattacctc agaaatgatt gaaaatatcc aatctgttaa ggcatactgctgggaagaag caatggaaaa aatgattgaa aacttaagac aaacagaact gaaactgactcggaaggcag cctatgtgag atacttcaat agctcagcct tcttcttctc agggttctttgtggtgtttt tatctgtgct tccctatgca ctaatcaaag gaatcatcct ccggaaaatattcaccacca tctcattctg cattgttctg cgcatggcgg tcactcggca atttccctgggctgtacaaa catggtatga ctctcttgga gcaataaaca aaatacagga tttcttacaaaagcaagaat ataagacatt ggaatataac ttaacgacta cagaagtagt gatggagaatgtaacagcct tctgggagga gggatttggg gaattatttg agaaagcaaa acaaaacaataacaatagaa aaacttctaa tggtgatgac agcctcttct tcagtaattt ctcacttcttggtactcctg tcctgaaaga tattaatttc aagatagaaa gaggacagtt gttggcggttgctggatcca ctggagcagg caagacttca cttctaatga tgattatggg agaactggagccttcagagg gtaaaattaa gcacagtgga agaatttcat tctgttctca gttttcctggattatgcctg gcaccattaa agaaaatatc atCTTtggtg tttcctatga tgaatatagtacagaagcg tcatcaaagc atgccaacta gaagaggaca tctccaagtt tgcagagaaagacaatatag ttcttggaga aggtggaatc acactgagtg gaggtcaacg agcaagaatt

agaatttcat

at[T/C]gtg

gaagaggac

3.2 billion letters of human DNA ~ 2 million variation points (SNPs) SNP = Single Nucleotide Polymorphism

Life Sciences ChallengeCorrelate Biological and DNA Variation

Graphics courtesy of the National Human Genome Research Institute

Life Sciences ChallengeCorrelate Diseases, Genes and Environment

Myocardial Infarction

Stroke

Diabetes

Breast cancer Manic-depression

Obesity

Hyperlipidemia

Inflammatory Bowel Disease

Hypertension

Schizophrenia

Graphics courtesy of the National Human Genome Research Institute

0

50TB

100TB

150TB

200TB

250TB

300TB

350TB

400TB

450TB

500TB

Life Science Challenge Exploding Volumes of Data

“To meet the scientific goals we believe we need to add around 80 - 100TB of storage each year for the next 5 years”

P. Butcher, The Sanger Centre

199

41

995

199

61

997

199

8O

ct-

19

99A

pr-

20

00N

ov

-200

1J

an-0

12

002

200

32

004

200

52

006

Data StorageToday

Life Science Challenge Many Different Kinds of Data

GenomicsGenomics

FunctionalGenomics

FunctionalGenomics

Chem-informatics

Chem-informatics

ProteomicsProteomics

Pharmaco-genomics

Pharmaco-genomics

ModelingModeling

ClinicalClinical

PathwaysPathways

Graphic modified from original courtesy of Sun Microsystems

Life Science Challenge Just A Few Biological Databases

Life Science ChallengeTypical Research Environment

Industrial Research Lab

Public Databases

Private/Service Databases

Local Copies

Partner or Collaborator

Local Databases

Find Patterns and

insights

Manage vast quantities of data

Collaborate securely

Access heterogeneous

Data

Access heterogeneous data

Integrate a variety

of data types

BrowserMobile Device

Oracle10gApp Server

Oracle10gDatabase

Server

Clients

Run All YourRun All YourApplicationsApplications

Manage All Manage All Your DataYour Data

Oracle Vision : At the core is a data management platform

Introducing Oracle 10g

Runs all your applications Stores all your information Highly scalable, available,

reliable Secure Easy to manage

– Make individual systems self-managing

– Manage thousands of servers at once

GenomicsGenomics

ProteomicsProteomics

PathwaysPathways

CheminformaticsCheminformatics

ClinicalClinical

1. Access heterogeneous data2. Integrate a variety of data types3. Manage vast quantities of data4. Find patterns and insights 5. Collaborate securely

Oracle’s Platform for Life Sciences

Oracle Life Sciences Platform

Find Patterns and

insights

Manage vast quantities of data

Collaborate securely

Access heterogeneous

Data

Access heterogeneous data

Integrate a variety

of data types

Oracle Life Sciences Platform

Collaboration SuiteCollaborate securely

iFS/Files Share documents

XML DBFlexibly manage data

interMediaStore & manage images

SQL LoaderHigh performance data loader

Web ServicesStandard communication between applications

Merge/UpsertEnabling update and insert in one step

Oracle PortalBuild personalized portals

Application ServerProvide scalability for themiddle tier

Transparent GatewaysFast access using Oracle OCI

Distributed QueriesPerform searches across domains

Generic GatewaysAccess any data using ODBC

e.g. SwissProt SP-ML

Transportable Tablespaces

Rapidly exchange tables

Oracle StreamsRule-based subscription for

information sharing

Data MiningDiscover patterns & insights

StatisticsPerform basic statistics

Table FunctionsImplement complex algorithms

OLAP & DiscovererInteractive query & drill-down

SecurityEnforce security

AuditingCreate audit trail to facilitate FDA compliance

WorkflowAutomate laboratory & business processes

Extensibility Framework (Data cartridges), manage complex scientific data LOBsManage unstructured data

TextIndex & query text, e.g. literature searches

Real Application Clusters Linear scalability

Cl

Cl

O

e.g. PubMede.g. MySQLGenBank

External TablesAbility to index and query external files

UltraSearchSearch external sites

& repositories

MySQL ToolkitEasily move MySQL

data into Oracle

Find Patterns and

insights

Manage vast quantities of data

Collaborate securely

Access heterogeneous

Data

Access heterogeneous data

Integrate a variety

of data types

Oracle Life Sciences Platform

Collaboration SuiteCollaborate securely

iFS/Files Share documents

XML DBFlexibly manage data

interMediaStore & manage images

SQL LoaderHigh performance data loader

Web ServicesStandard communication between applications

Merge/UpsertEnabling update and insert in one step

Oracle PortalBuild personalized portals

Application ServerProvide scalability for themiddle tier

Transparent GatewaysFast access using Oracle OCI

Distributed QueriesPerform searches across domains

Generic GatewaysAccess any data using ODBC

e.g. SwissProt SP-ML

Transportable Tablespaces

Rapidly exchange tables

Oracle StreamsRule-based subscription for

information sharing

Data MiningDiscover patterns & insights

StatisticsPerform basic statistics

Table FunctionsImplement complex algorithms

OLAP & DiscovererInteractive query & drill-down

SecurityEnforce security

AuditingCreate audit trail to facilitate FDA compliance

WorkflowAutomate laboratory & business processes

Extensibility Framework (Data cartridges), manage complex scientific data LOBsManage unstructured data

TextIndex & query text, e.g. literature searches

Real Application Clusters Linear scalability

Cl

Cl

O

e.g. PubMede.g. MySQLGenBank

External TablesAbility to index and query external files

UltraSearchSearch external sites

& repositories

MySQL ToolkitEasily move MySQL

data into Oracle

Flat files

Distributed query

Transparent Gateway

External Sites

MySQL

Generic Connectivity

MySQL Migration Toolkit

DBlinks

UltraSearch

Sybase DB2

Transparent Gateway

External Table

Transportable Tablespaces

1. Access Heterogeneous Data

1. Access Heterogeneous Data

Oracle Transparent Gateways

– Integrate data from disparate systems

Generic Connectivity– ODBC/JDBC connectivity

External Tables– Access data from flat files

Distributed Queries– Query across multiple Oracle and

heterogeneous data sources Transportable

tablespaces– Rapidly move tablespaces between

Oracle databases

SQL*Loader– High performance data loader

Oracle Streams– Rule-based subscription for information

sharing

Dblinks– Connectivity between databases

UltraSearch– Query range of data repositories (web

sites, files, email, databases, etc.)

Migration Toolkits– Tools to facilitate movement of data into

Oracle

Merge / Upsert – Update and insert in one step

Flat filesMySQL

GenomicsGenomics

FunctionalGenomics

FunctionalGenomics

Chem-informatics

Chem-informatics

ProteomicsProteomics

Pharmaco-genomics

Pharmaco-genomics

ModelingModeling

ClinicalClinical

PathwaysPathways

Graphic modified from original courtesy of Sun Microsystems

2. Integrate a Variety of Data Types

XML DB– Unite XML content and relational data– SQL & XML become one

LOBs– Manage unstructured data

Internet File System (Oracle Files)– Manage files and folders

Text– Index and query of text content & documents (Word,

Powerpoint, HTML, Adobe PDFs, etc.) interMedia

– Manage audio, video and image data

XMLXML

2. Integrate a Variety of Data Types

European Bioinformatics Institute (EBI)

Hosts major public databases (e.g. SwissProt, EMBL Nucleotide Sequence Database, Medline) on Oracle. (Total: > 5 TB)

Uses Oracle XML DB and Oracle Text for Medline – in development.

– Size: 11 million records, 200 GB

Uses Oracle9i Database and Application Server.

Extensibility Framework (Data Cartridges) - Manage complex scientific data

Oracle8iServer

Service Interfaces

DataCartridge

Extensibility Interfaces

TypeSystem

QueryProcessing Data

IndexingServer

Execution . . .

Database Extensibility Services

Oracle9iServer

2. Integrate a Variety of Data Types

Chemistry searching requires special techniques

– Chemical name is not unique

Chemical Searching

Chemistry searching requires special techniques

– Chemical name is not unique“Viagra®”

Chemical Searching

Chemistry searching requires special techniques

– Chemical name is not unique“Viagra®”

“sildenafil citrate”

Chemical Searching

Chemistry searching requires special techniques

– Chemical name is not unique“Viagra®”

“sildenafil citrate”

N

N

SO O

O

N

NN

N

O

H

H H

HHH

H

H

H

– Chemists think graphically

Chemical Searching

Chemistry searching requires special techniques

– Chemical name is not unique “Viagra®”

The solution:

– A graphical user interface

–Specialized operators such as substructure search (“sss”) = a chemical “contains”

“sildenafil citrate”

N

N

SO O

O

N

NN

N

O

H

H H

HHH

H

H

H

– Chemists think graphically

Cl

Cl

O

finds

Chemical Searching

MDL Information Systems, Inc. MDL Discovery Framework

A multi-tier system for managing and integrating discovery data and workflows

– Domain-specific application and database services and API

– Chemistry rules, drawing, and rendering

– Single application access to multiple DBs and services

Key Advantages– Integrate data sources across R&D– Easily create web or client

solutions– Quickly adopt new tools and

methods for development www.mdl.com

Oracle Features– Oracle 8i/9i Database

Extensibility Option (chemical data cartridge)

– Replication support– Oracle9iAS J2EE services

IDBS The ActivityBase Suite

– Capture, manage and use chemical and biological data in life sciences discovery

– Manage full range of disparate data types

– The leading application for drug discovery research worldwide

Key Advantages– Integration framework for

cheminformatics and bioinformatics data

– Rich data context enables data quality– Supports manual and automated data

capture & management– Maximizes the value of discovery data

www.id-bs.com

Oracle Features– Chemistry cartridge (ChemXtra)– PL/SQL stored procedures– JAVA stored procedures – XML– Materialized views– Data warehousing– 9i compatible

Grid support in Oracle 10g Oracle Scales to Petabytes

– Largest life sciences databases run Oracle– Oracle 80% market share - IDC

Partitioning– Divide and conquer

Oracle 10g Application Server– Provide scalability for middle tier

Oracle Data Guard– Protect data from human or system failures

3. Manage Vast Quantities of Data

0

50TB

100TB

150TB

200TB

250TB

300TB

350TB

400TB

450TB

500TB

19

94

19

95

19

96

19

97

19

98

Oc

t-1

99

9A

pr-

20

00

No

v-2

00

1J

an

-01

20

02

20

03

20

04

20

05

20

06

Data StorageToday

3. Manage Vast Quantities of Data Support for Grid

Distributed queries, External Tables, Security, RAC

Grid Access to Oracle Utilities through Globus Resource Allocation Manager (GRAM)

– Export, Import, SQLPlus Grid Access to Oracle 10g Database

– Invoke PL/SQL routines specified in Globus Resource Specification Language

Grid Resource Information Service (GRIS) for Oracle Database

– Discover & monitor Oracle databases

High-speedinterconnect

3. Manage Vast Quantities of Data

– Works with ALLapplications

– Fail-over transparent to users

– Easy to administer

• Real Application Clusters (RAC)– Start with one server, one database and grow

as you grow– Linear scalability out of the box– Save on Hardware and Storage costs

DataLoads

Sample/LabProteomics Portal

A-Z

Oracle Real Application Clusters Works for All Applications

OracleOracleOracleOracle

1. Add new node1. Add new node

2. Start instance on new 2. Start instance on new nodenode

1. Add new node1. Add new node

2. Start instance on new 2. Start instance on new nodenode

No Code Change

Oracle Real Application Clusters Greater Than 85% Scalability

0%

20%

40%

60%

80%

100%

1 Node 2 Nodes 4 Nodes 8 Nodes 16 Nodes

Leading biotech company– Over 2 TBs of data in Oracle– Oracle serves as a centralized

information resource for gene searching and database cross-referencing.

– Oracle used for the entire pipeline from research to clinical data to manufacturing and sales applications.

Key Advantages of Oracle– Improved performance – Greater reliability – Genentech's corporate goal is

99.999% availability in a 24x7 environment

Oracle Environment– Oracle 9i database– Real Application Clusters

Oracle9i Real Application Clusters provide the foundation for the scalable and highly available database infrastructure we require to meet our growing data demands in all areas of our business."

--Scooter Morris, Genentech, Inc.

Genentech, Inc.

The Dragon Genomics Centerof Takara Bio Inc.

High-Level Project Goals– Manage data throughout every

step of a complicated process– Create a laboratory information

management system (LIMS) enabling large scale sequencing

– Provide reliable back up and recovery of vast amounts of data

Key Benefits– Provided easy access and

management for vast amounts of data

– Ensured scalability needed to accommodate future growth

Oracle Environment– Oracle Database Enterprise

Edition– Oracle9iAS Enterprise Edition

"We trust Oracle in its ability to run terabyte-class databases in clustered environments with high availability. And we're pleased to say that Oracle has not disappointed us. "

-- Toru Suzuki, Project Manager, Dragon Genomics Center, Takara Bio Inc.

The Dragon Genomics Center of Takara Bio Inc., specializing in large-scale sequencing, is among the highest speed genome-analyzing centers in Asia.

Bioinformatics Center Institute for Chemical Research Kyoto University The Bioinformatics Center Institute for Chemical Research Kyoto University is leading biotechnology research thanks to its comprehensive studies in various areas, including the life sciences, information sciences, chemistry and physics.

“In order to manage this massive amount of genetic information and to operate efficiently, it is essential to have a platform with paramount stability. Our web site receives accesses from all over the world continuously, 24 hours a day. In order to offer the latest information under such circumstances, performance is also an issue. In this sense, the Oracle Database was the most appropriate since it can handle this enormous amount of data in a fast and stable manner, 24 hours a day.”

– Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for Chemical Research Kyoto University

4. Find Patterns and Insights Oracle Data Mining

– Find relationships and clusters associated with healthy and diseased states

Naïve Bayes, Adaptive Bayes Networks, Attribute Importance, Association Rules, K-Means, O-Cluster, SVM, NMF algorithms

Data Mining for Java (DM4J) GUI wizards and results browser

Oracle Discoverer & Oracle OLAP– Interactive query & drill-down

Statistical functions– Perform basic statistics in Oracle

e.g. summary statistics, e.g. mean, stdev, median, quantiles, hypothesis testing, distribution fitting, correlations, linear regression

Oracle Text & Text Mining– Classify & cluster documents relevant to area of interest

Table Functions– Implement complex algorithms within the database

Deductive Analysis

Inductive Analysis

Answer complex questions about the

relationships in genomic, clinical and pharmacological data

Finding relationships for classification,

class discovery and prediction

Life Sciences data

Pharmacological databases

Proteomics Database

Clinical Databases

4. Find Patterns and Insights

Functional Genomic

Databases