an ontology for protein- protein interaction data karen jantz cis honors project december 7, 2006

Post on 11-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Ontology for Protein-Protein Interaction Data

Karen JantzCIS Honors ProjectDecember 7, 2006

Overview

Problem Statement Objectives Approach Background Methodology Evaluation Demonstration Conclusion

Problem Statement

Several sources for protein-protein interaction data

Different schemata Different purposes Different strengths/weaknesses

Objectives

Unify the data Enable data mining Evaluate reliability of data across

data sources Gain new information about the

entire data set Enable others to easily add other

data sources to the set

Approach: ontology

o ontology – n.1. that which exists (philosophy)2. that which is represented (artificial

intelligence)o A descriptive data modelo Defines the entities and

relationships within a domaino Based upon datao Human-readable

Approach: ontology

Data integration Enables simultaneous querying across

multiple databases Data transformation

Enables interchange between database formats

Data mining Enables reasoning and learning over

the entire data set

Background: Data Sources

DIP (Jing Xia)

Database of Interacting Proteins

Most reliable data set Jing Xia

BIND (Abhijit Erande, Aaron Schoenhofer)

Biomolecular Interactions Network Databank

Very large data set Contains interactions, molecular

complexes, and pathways

Background: Data Sources

MINT Molecular INTeractions database

experimentally verified protein interactions Evaluates confidence level

IntAct Not limited to binary interactions Allows user submissions

mips CYGD Munich Information Center for Protein Sequences:

Comprehensive Yeast Genome Database

Limited to yeast Focuses on sequencing

Background: Tools

Protégé Open-Source Project Graphical ontology editor Interacts with OWL Reasoner Detailed API for modifying ontologies

programmatically

Background: Tools

Prompt A Protégé Plugin Enables ontology mapping Enables ontology comparison

Background: Related Work

PSI-MI Controlled vocabulary for PPI data Not a proposed database structure Decreases the strength of information Helpful in defining relationships and

keys

Methodology: Overview

Q: What interactions have been observed between with protein A?

DIP BIND MIPS MINT IntAct

WebInterface

Unified Ontology

UnifiedData Set

Q: What experiments give evidence for a given interaction?

Methodology: Design

Review the singular database schemata and determine strengths/weaknesses

View data files Native formats PSI-MI formats

Create a unified schema of the data sources

Create the unified ontology in Protégé Create each singular database as a subset

of the unified ontology

Protégé Screenshot

Methodology: Data Import

DOMParser Load data from XML

Protégé-OWL API Insert entities into singular databases

Methodology: Transformation

Use Prompt to create a mapping for each specific data source to the unified ontology

Use Prompt mappings to insert individuals from each singular ontology into the unified model

Methodology: Transformation

Duplicate Data Need to fill in attributes on existing

records Write ‘Algorithm Plugin’ for Prompt to

determine when individuals are the same

Prompt Screenshot - Mapping

Methodology: Query Interface

Export Protégé data into MySQL Web interface for collecting data Working with domain experts to

determine useful views, queries

Evaluation

Performance Transformation Time in Protégé Query Time for Web Interface

Size Minimize redundancy in data model Minimize duplicate data

Evaluation

Correctness Domain Experts

Dr. Brown, Dr. Wang Maintain proper data relationships

Utility Enrich data

Evaluation

Data Model Enrichment

0

5

10

15

20

25

30

IntAct MINT MIPS

Database

Nu

mb

er o

f C

lass

es

New

Changed

Existing

Demonstration

Future Work

Complete transformations Import data Evaluate ontology Add other databases to model

Conclusions

Adequate start Needs improvement, evolution,

more data sources As the project matures, the ontology

will be ready for use in the biological domain

Will be able to more easily gain information about protein-protein interactions

References

AAAI.org - AITopics: “Ontology” http://www.aaai.org/AITopics/html/ontol.html

Protégé http://protege.stanford.edu/overview/protege-o

wl.html Prompt

http://protege.cim3.net/cgi-bin/wiki.pl?Prompt PSI-MI

http://psidev.sourceforge.net/mi/xml/doc/user

References

BIND http://www.bind.ca

DIP http://www.dip.doe-mbi.ucla.edu

IntAct http://www.ebi.ac.uk/intact/site/

MINT http://mint.bio.uniroma2.it/mint/Welcome.do

MIPS http://mips.gsf.de/genre/proj/yeast

Q & A

top related