marco brandizi corso di dott. in informatica, univ. milano bicocca xix ciclo progress report
DESCRIPTION
Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005. Agenda. Microarrays and Gene Expression overview A Knowledge Managment System for uA data management Motivations What to model and where to start from First elaborations - PowerPoint PPT PresentationTRANSCRIPT
Marco Brandizi Marco Brandizi
Corso di Dott. in Informatica, Univ. Milano BicoccaCorso di Dott. in Informatica, Univ. Milano Bicocca
XIX CicloXIX Ciclo
Progress ReportProgress Report
Feb 2005Feb 2005
AgendaMicroarrays and Gene Expression overview
A Knowledge Managment System for uA data management
Motivations
What to model and where to start from
First elaborations
Ongoing work and future
Gene Expression and Microarrays
DNA
gene
mRNA
protein
Genes Machine
Cell/Life
Microarray Data / Details
Microarray Data
Microarrays Data Mgmt IssuesExp. data vs. seq. data:
Context dependent (living system, exp. Conditions)
Lack of standard unit of measure
Several normalizations methods
Multiple platforms and methods
No standard for data annotation
Vocabularies and terminology coherence
Details about: experiment, source, protocols, exp. conditions
Microarrays Data Mgmt Issues / 2Evidences about data quality
What to store?
Raw Images
Computed values
Normalized values
How to find data
Complex vocabularies aware systems (ontologies)
Data mining and exp. comparison tools
Data access control
Issues => MIAME/MAGE
MIAME Experiment Modelling
GCA DB
GCA DB
GCA DB
Need of a KMS for uA data management
The uA Experiment Cycle
“Closing the loop”
“Closing the loop”
uA KMS: What to model?
Knowledge management... what?Genes
Textual annotations, literature
Interactions, pathways.
Genes collections (functional families, clusters)
Experiment and Experimental Conditions
Keyword/ontology based searches
Tested conditions searches
Expression Values
Navigation
Same trascriptome/trend/correlation/pattern
Knowledge management... what?Chips
Keyword searches
Annotations about chip quality, protocols to be used, etc.
People
“Is expert in ...”
“Works with ...”
“Is studing ...”
Its ranking is X (based on publications, user preferences, etc.
Knowledge management... what?“Does IL-2 regulate something and under what conditions?”
Interactions of gene: IL2
Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus
Knowledge management... what?
Knowledge management... what?
Knowledge management... what?
uA KMS: Where to start from?
What to do first?Gene Expression Formal Model
Focused on GE measures
Oriented to “closing the loop” goal
Several things to start from
Ontologies and Inference Systems
Already defined alike models
Other alike systems
Defining a GE ModelStart Point: Ontologies and Inference Systems
XML->RDF->...->OWL, and related tools (ex.: Protegé, Racer, Jena)
Logics, particularly Description Logic
Inferential Systems and Languages (ex.: Prolog)
Defining a GE ModelStart Point: Already defined alike models
"Modeling Gene Expression", Proceedings of NETTAB/2004, www.loa-cnr.it, A model in Description Logic of GE, but without focus on microarrays and expression intentsities
Defining a GE ModelStart Point: Already defined alike models
Very similar to previous work, but with tools for annotation/querying of microarray chips
Yet, seems not focused on data/assays/etc. annotation.
Defining GE ModelStart point: Other alike systems
Synapsia by Agilent, very similar, but not focused on uAs
Hybrow, www.hybrow.org, a computer-aided hypothesis evaluation
The Notebook Project, www.notebook.org, a bio-KMS based on SOAP and P2P
2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C., From actions to suggestions: supporting the work of biologists through laboratory notebooks
Defining GE ModelStart point: Other alike systems
uA KMS: Toward a GE Model
Defining GE ModelGene Expression Formal Model
Basic elements: genes, hybridizations, experiments
Defining GE ModelGene Expression Formal Model
Basic elements: annotated sets
Defining GE ModelGene Expression Formal Model
Basic elements: annotated sets
Gene Expression Entities
Entities Grouping
EntitityCollection ::= Cluster of DataSet | Cluster of Entity
Cluster of DataSet ::= Cluster of DataSet | GeneCluster of DataSet.GeneSet | HybCluster of DataSet.HybSet
Cluster of Entity :: = Cluster of Entity | Set of Entity
All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!
Entities GroupingGeneSet ::=
Set of Gene
HybSet ::= Set of Hybridization
Set of X ::= { x : x IS-A X }
Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }
Annotations
Annotation ::= EntityCollection => AnnotationSet
Annotation allows to track Gene Expression data with useful info.
Annotations/BasicsAnnotation ( EmptySet ) ::=
EmptySet
Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )
Annotations/BasicsBaseAnnotation ( Any ) ::=
To be decided, first ideas is a set of: Name/Value/Type, and Description like in MAGE External Reference, with URI, or attachmentGraph attachment, "vectoring" values, ex: PCA with components values, scatter plots witAnnotation AuthorAnnotation DateSecurity/Access referencesAlike the classes Extendeable, Describable, Identifiable of MAGE-OMEntity annotates another Entity, ex.: Exp author
Annotations/BasicsAttributes ( Entity e ) ::=
Set of < attrib, value, type > for each declared attribute of Entityattribute may be declared in JavaBean fashion, optionally providing a mapping for type and semantic of attribute
Annotation ( GeneSet GS ) ::=BaseAnnotation ( GS )U Annotation ( g ) : g BELONGS GS U BiologicalAnnotation ( GS )
Annotations/Biological Ann.BiologicalAnnotation ( GS ) ::=
Allows for tagging the gene set with a biological meaning the genes have ben grouped whyEx.:
belonging to functional family of apoptosisin the KEGG pathway about IL-2under GO ID #10234
Annotations/Data SetsAnnotation ( Cluster of DataSet ds ) ::=
BaseAnnotation ( ds ) U Annotation ( < all entities in ds > ) Meaning of clustering
Clustering method / alghoritmAlghoritm annotations, ex.: parameter values
Cluster includes the case of flat set (not tree), and sub-cases: gene/hybs filtering ( genes have been filtered in from another data set ) values transformation ( normalization, PCA, average on replicas )
Annotations/ExamplesTypes of annotations / searches:
Generic <attribute> LIKE <pattern><value> BETWEEN ( <lo>, <hi> ) <author> IS author
Genes public_id LIKE patternREGULATION ( g1, g2, ... gn ) g1 REGULATES | DOWN_REGULATES | UP_REGULATES | PROMOTE | INHIBITS ( g1, g2, ... gn ) geneX IN_PATHWAY ( p )
Annotations/ExamplesDataSet
geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET dshybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds Not necessarily computed, annotated. CORRELATION ( dSet1, dSet2 ... dSetN, value )
annotates the expression values correlation
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.
geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ...geneSet1 TIME_SHIFT geneSet2 ...geneSet1 TIME_OPPOSED geneSet2geneSet1 TIME_OPPOSED_SHIFT geneSet2
Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.
Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations
OperatorsOperations and relations with data
When storing result of operations, result source may be annotated and annotation composed coherently:
gset = gset1 U gset2 save ( gset, annotation ) gset is saved with:
further annotation provied by userSOURCE ( UNION, geset1, gset2 ) all annotations coming from gset1 and gset2 belongs to gset too
OperatorsAset theory operations:
EntityCollection U EntityCollection ... U EntityCollection = EntityCollectionEntityCollection INTERSECTION EntityCollection ... INTERSECTION EntityCollection = EntityCollectionEntityCollection - EntityCollection = EntityCollection
Compositions: new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN, geneSetAnnotation )new Cluster ( cluster1, cluster2, clusterAnnotation )
Relations on single entititiesgene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX
Ongoing and future...
What's next?Refinements of GCA, study of BASE
Study of Ontologies tools and Ontology reasoners
Better definition of GE Model
Review with biologists
Cooperation with Ontology Groups (proposals are welcome...)
To be continued...