lei liu, ph.d. the fifth china-us roundtable on scientific data cooperation october 27-28, 2011,...
TRANSCRIPT
Lei Liu, Ph.D.Lei Liu, Ph.D.Lei Liu, Ph.D.Lei Liu, Ph.D.
The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China
Biomedical Data Integration and Knowledgebase
Shanghai Center for Bioinformation TechnologyAnd
Shanghai Institutes for Biological Sciences, CAS
Part 1: Ontology
Knowledge Management
Data Integration and Exchange
Semantic Interoperability
Decision Support and Reasoning
Knowledge Management
Annotating Data and Resources
Accessing Biomedical Information
Mapping across Biomedical Ontologies
Ontology
Data Exchange &Semantic Interoperability
Information and Data Integration
Semantic Interoperability
Ontology
Decision Support and Reasoning
Data Selection
Data Aggregation
Decision Support
Natural Language Processing Applications
Knowledge DiscoveryOntology
Example: Ontology Server
Example: Building Knowledge Base
Edit
Example: Building Knowledge Base
Search tool
Part 2: SNOMED CT
Systematized Nomenclature of Medicine—Reference
Terminology(SNOMED RT)
Clinical Terms(CT)
SNOMED CT
CAP
NHS CAP: College of American PathologistsNHS: National Health ServiceIHTSDO: International Health Terminology Standards Development Organization
Core contents
SNOMED CT
Applications
Electronic Health Record Systems Computerized Provider Order Entry(CPOE)Knowledge databases used in clinical decision
support systems(CDSS) Remote Intensive Care Unit Monitoring Laboratory Reporting Cancer Reporting Genetic Databases
SNOMED CT
Medical domains of the 100 Medline indexed papers in which a specific medical domain has been described. (BMC Medical Informatics and Decision Making 2008, 8(Suppl 1):S2)
SNOMED CT
Example: Mapping
Example: Encoding
Example: Standardization of Terminology
Part 3: OpenEHR
Objectives
Promote and publish formal specification
Promote and publish EHR
architectures and models
Interoperable health informatics
system
Maintain open source “reference”
implementation
Implement EHR architectures into
clinical use
Work closely with standards bodies
openEHR introduction
• Definition: – openEHR is an open standard specification in
health informatics that describes the management and storage, retrieval and exchange of health data in electronic health records (EHRs)
• Features:– Patient-centric– Lifelong– Vendor-independent
Architecture of OpenEHR
OpenEHR Release 1.0.2
Two-level modeling of openEHR
openEHR EHR system implementation
applicability
• Apply– Store data– Search data– Share data
• Not apply– Control the
exchanging flow
Integration of SNOMED CT into OpenEHR
HL7 v3 introduction
• mission: provides standards for interoperability
• Features– standard data, use reference information model
(RIM)• CDA, standardize clinical documents for exchange
– support healthcare workflows (V3 messaging)
RIM
applicability
• Apply– exchange information– Control the exchanging flow– Control the exchanging data’s size
• Not apply– Store data (we can store CDAs, but it’s not a best
practice)– Search data
Ongoing Biomedical Informatics Projects
Clinical Decision Support
Medical Natural Language Processing
Synchronous Liver Metastasis Model
Drug Knowledge Base
Data SharingData Sharing++
Decision SupportDecision Support
Data SharingData Sharing++
Decision SupportDecision Support
Others
Medical Channel
Common Data Element Editor
Research Data Entry System
CDA Transfer Engine
HL7 V3 Message Model
Tissue Bank Annotation
Medical Terminology Service
Clinical Data Warehouse
Clinical Guideline Computerization
Clinical Data and Sample is at the Core Clinical Data and Sample is at the Core of Translational Medicineof Translational Medicine
Clinical Data
Clinical Practice
Biomarker
Biospecimen
Clinical Trial
LIMS
Genotypes
Domain Workspaces
Cross Cutting & Strategic
Workspaces
Clinical Trials Management Systems (CTMS ,临床实验管理系统 )
https://cabig.nci.nih.gov/workspaces/CTMS/
Clinical Trials Management Systems (CTMS ,临床实验管理系统 )
https://cabig.nci.nih.gov/workspaces/CTMS/
Integrative Cancer Research (ICR ,综合肿瘤研究 )
https://cabig.nci.nih.gov/workspaces/ICR
Integrative Cancer Research (ICR ,综合肿瘤研究 )
https://cabig.nci.nih.gov/workspaces/ICR
Tissue Banks & Pathology Tools (TBPT ,组织库 & 病理学工 )
https://cabig.nci.nih.gov/workspaces/TBPT
Tissue Banks & Pathology Tools (TBPT ,组织库 & 病理学工 )
https://cabig.nci.nih.gov/workspaces/TBPT
In Vivo Imaging( Vivo 成像)
https://cabig.nci.nih.gov/workspaces/Imaging
In Vivo Imaging( Vivo 成像)
https://cabig.nci.nih.gov/workspaces/Imaging
Vocabularies&Common Data Elements (VCDE ,词汇 & 公共数据元素 )
https://cabig.nci.nih.gov/workspaces/VCDE
Vocabularies&Common Data Elements (VCDE ,词汇 & 公共数据元素 )
https://cabig.nci.nih.gov/workspaces/VCDE
Architecture( 体系构架 )
https://cabig.nci.nih.gov/workspaces/Architecture
Architecture( 体系构架 )
https://cabig.nci.nih.gov/workspaces/Architecture
Data Sharing & Intellectual Capital (DSIC ,数据共享 & 智能财产 )
https://cabig.nci.nih.gov/working_groups/DSIC_SLWG
Data Sharing & Intellectual Capital (DSIC ,数据共享 & 智能财产 )
https://cabig.nci.nih.gov/working_groups/DSIC_SLWG
Documentation & Training (D&T ,文件 & 培训 )
https://cabig.nci.nih.gov/working_groups/Training_SLWG
Documentation & Training (D&T ,文件 & 培训 )
https://cabig.nci.nih.gov/working_groups/Training_SLWG
caBIG® WorkspacescaBIG® Workspaces
References and StandardsReferences and Standards
References used: ① caCORE (Cancer Common Ontologic Representation Environment): ② caDSR (Cancer Data Standards Repository) ③ NCI CBIIT (National Cancer institute Center for Biomedical
Informatics and Information Technology)
Collaboration with NCI and caBIG:
•Attended the caBIG annual meeting and visited caBIG in 2008•Two people from our center attended the Boot Camp
Tissue Bank Information Management System
样本数据库信息管理系统样本数据库信息管理系统全面解决方案全面解决方案
Biobank Information Management PlatformBiobank Information Management Platform
Use Cases
Combined Tissue Bank Annotation from Operation Summary and Pathology Report
Medical Natural Language Processing
Difficulties of acquiring data and multiple times of entering
Direct connection to HIS 、 LIS and EMR
Automatic transferring of data without entering by staffs
Active reminding system for follow-up
Automatic Data Query and Extraction Across Systems
Molecular classification
database
Diagnostic tests
database
Patients‘ situation
of treatment database
Patient follow-updatabase
Sample database
Personalized treatment
procedures
Clinical Information Enquiry System: The overall framework and subsystems
HISdatabase
LISdatabase
General enquiries
PACSdatabase
D-QISdatabase
Clinical Information Enquiry System
Clinical Data Warehouse
Part to process RIM object(create, delete, update, query RIM boject)
RDB
Persistence layer(hibernate)
RIM processor(busyness layer)
TABLE
Javasig
CALL
RIM object
mif document
xmldocument
RIM database structure
R-MIM ModelR-MIM Model Database StructureDatabase Structure
Clinical Document( XML )
Clinical Document( XML ) Database RecordsDatabase Records
SOA Service BusSOA Service Bus
Clinical Data Warehouse
Clinical Document CDA Transfer Engine
Schema for Clinical Document HL7 CDA Schema
Transfer Engine
Discharge Summary
CDA File
MappingMapping
Common Medical Terminology Service
Difficulties of Extracting Data
METHODS
Model's Performance
Model's Performance
Model's Performance
Biomedical Data Integration and Mining
Integration
Data Mining
Personalized Medicine Databases
Personalized Medicine Decision Support System
Medical Informatic
Bioinformatics
Translational Medicine
Genomics
Disease and Gene Integration
GAD COSMIC
Data Integration
Gene2Disease Databases
参考文献信息参考文献信息
试验样本信息试验样本信息
突变信息突变信息
疾病信息疾病信息
疾病分类信息疾病分类信息
基因信息基因信息
主表主表
Genetic Polymorphisms 39910Gene Mutations 150654519 Major DiseasesStructured Gene Information 31412
Drug and Drug metabolism Study
• Drug-Target-SNP Integration and Databases
SNP Drug
Data Integration
Drug-Target Polymorphism Databases
dbSNP
HapMap
Query Drug-Target-SNP
Data Tables Records
Enzymes 50
Drug Targets 3866
Drug-Enzymes relationship 4387
Drug Information 4414
Enzymes-SNP 9558
Drug-Target Information 12051
rsSNP 332476
GenBank_SNP 337259
ssSNP 1745368
Mutations from populations
1839782
Total 201MB
Drug Info
Target Info
SNP Info
Drug Metabolism
Mutation Information Integration
• Extraction from Locus-specific databasesLSDB AddressesLSDB Addresses
Using WiKi Collect LSDB Addresses1300 LSDB Classification of Genes link to OMIM Database
http://129.89.44.120/twiki/bin/view
Mutation Information ExtractionMutation Information Extraction
Natural Language ProcessingTwo LSDB Data Extration
Alzheimer Disease & Frontotemporal Dementia Mutation DatabaseSarcomere Protein Gene Mutation Database
1725 mutation records
• Mutation Association with Disease Phenotypes• Standards
• Gene Names -- HUGO• Diseases ( ICD-10 )
– Mapping ICD-10 and MeSH, using keyword search– Adopt SNOMED CT , Build Disease Ontologies
映射后的 ICD-10 疾病词汇表
Mutation Information Integration
Disease Related Unique Mutation Search Engine , DRUMS
Query
Genes, Diseases, Mutations, Sequences
More than 170,000Mutations, 6000 genes
External Links
Documents upload
By GenesBy DiseasesBy Mutation types
http://www.scbit.org/glif
Mutation Information Integration
• DRUMS Query Results
Mutation Information Integration
Biomedical Informatics Systems for Translational Research
BioBankBioBank•EMR for Research •EMR for Clinical Trial•Follow-up Information Systems
•EMR for Research •EMR for Clinical Trial•Follow-up Information Systems
•Omic Databases•LIMS•Bioinformatics Analysis Platform
•Omic Databases•LIMS•Bioinformatics Analysis Platform
Database Establishment for Translational Research
Star Server
EDW
HEO
DE-
IDEN
TIFI
CATI
ON
One way hashOne way hash
Dat
a Pa
rsin
gD
ata
Pars
ing
Information collected Information collected during clinical careduring clinical care
Restructuring for research
Data export
SD Database
Access through secured online application
Informatics in EMR-based PGx Studies
• Natural language processing (NLP)• Machine learning & data mining
DNA Biobank EMR Drug Exposure
Drug Response
Informatics Approaches
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
B699tre563m
sd
..F
5rt783m
bn
cd
s…
Information Flow in Translational Medicine
New Therapeutic knowledge
Clinical Practice
Biospecimen
Clinical Data
High Throughput
Research
CODATA Task Group of Biomedical Ontology
提出生物医学数据互操作中的最关键问题提出研究的重点方向提出研究的思路与可能的技术路线研讨预期的研究结果和可能的应用研讨此研究的立项可能
The interoperability of Biomedical DataOntology Building PrinciplesData Sharing StrategiesTechnical RoadmapExpected Achievements
Plan to make the first Discussion Meeting in 2011
2011 年内召开第一次研讨会,提出研究思路,形成核心团队,制定研究计划。