a visual analytics approach to augmenting formal concepts with relational background knowledge in a...
TRANSCRIPT
![Page 1: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/1.jpg)
1
A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain
7th December
2010
Elma Akand*, Mike Bain, Mark Temple
*CSE, UNSW/School of Biomedical and Health Sciences,UWS
The Sixth Australasian Ontology Workshop, Adelaide University of South Australia
![Page 2: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/2.jpg)
Outline
Machine learning and data mining in bioinformatics
Domain Ontologies in biomedical applications
Formal Concept Analysis
MCW algorithm (Mining Closed itemsets for Web apps)
BioLattice – a web based browser
Experimental Application: systems biology
Part-1: Concept ranking by gene interaction
Part-2: Relational learning of multiple-stress rules
![Page 3: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/3.jpg)
Machine learning & Data mining in Bioinformatics
Bioinformatics
“Bioinformatics is the study of information content and information flow in biological systems and processes” (Michael Liebman,1995) Machine Learning & Data mining
-Can offer automatic knowledge acquisition
-Process to discover knowledge by analyzing data from different perspectives and can contribute greatly in building knowledge base Our work: focus on knowledge-based machine learning- Previous work: learning from ontologies - Current work: ontology construction by learning- Potential application areas: ontologies – central to eCommerce, eHealth- Current application area: systems biology – predict gene function, data integration
![Page 4: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/4.jpg)
Ontology
In philosophy - concerned with nature and relations of being
In knowledge representation - study of categorization of things:
Informal Ontology
Formal Ontology
Natural language
First order logic or a variant
Upper Ontology
Domain Ontology
Specific
General
Ontology
Ontology – "specification of a conceptualization” (Gruber, 1993)
Conceptualization – "formalization of knowledge in declarative form” (Genesereth and Nilsson, 1987)
![Page 5: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/5.jpg)
Gene Ontology
Missing concepts and relations
One gene annotated with different GO terms with a term specialization of other
a
b
xy
x
gene: x concepts : a ,brelations : (i) x- a (ii) x- b and (iii) b - a
![Page 6: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/6.jpg)
Formal Concept Analysis (FCA)
Mathematical order theory (Rudolf Wille in the early 80s)
-Derives conceptual structures out of data
-Method for data analysis, knowledge representation and information management
Components
-Formal context, concept , concept lattice
four-legged
hair-covered
intelligent marine thumbed
cats x x
dogs x x
dolphins x x
gibbons x x x
humans x x
whales x x
![Page 7: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/7.jpg)
Formal concepts in a concept lattice({cats, gibbons, dogs, dolphins, humans, whales}, {-})
Bottom
({gibbons, dolphins, humans, whales}, {intelligent})
({dolphins, whales}, {intelligent, marine})
({cats, gibbons, dogs}, {hair-covered})
({cats, dogs}, {hair-covered, four-legged})
({gibbons, humans}, {intelligent, thumbed})
({gibbons}, {intelligent, hair-covered, thumbed})
({-}, {intelligent, hair-covered, thumbed, marine, four-legged})
2
1
56
Top
3
4
Formal context: an n by m Boolean matrixm attributes A columns n objects O rows
Formal concept: Galois connection <X, Y> X is a subset of A, Y is a subset of O
Concept lattice loosely interpretable in ontology terms:concept definitions and cf. T-box
sub-concept relations
concept membership cf. A-box
by objects
![Page 8: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/8.jpg)
FCA in data mining
FCA can be seen as a clustering technique in machine learning
-Most of the work is in a propositional framework
In data mining closed itemset mining is an efficient alternative to FCA
A frequent itemset X is closed if there exists no proper superset Y such that
Y⊃X with support(Y)=support(X)
E.g., if X = {a,b,c,d} and Y ={a,b,c,d,e} and support(Y)=support(X), then X is not closed
Parameters to avoid building entire lattice
-Extent size must be greater than minsup
Existing closed itemset mining algorithms
-Data structures to speed up closed itemset mining
-But may not build lattice, or include extents
![Page 9: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/9.jpg)
MCW algorithm (Mining Closed itemsets for Web apps)
Vertical data format
IT-tree (itemset-tidset tree) search space
-node has X x t(X) and all children have prefix X
Pruning
- 4 set difference closure operators
Subsumption check
- A look-up table to record all attributes and their occurrences in closed concepts
Lattice
- adding concepts following a general to specific order
D
2
4
5
6
A
1
3
4
5
C
1
2
3
4
5
6
T
1
3
5
6
W
1
2
3
4
5
attribute Concept_id
D C1,C2
T C3,C4
A C4,C5
W C2,C4,C5,C6
C C1,C2,C3,C4,C5,C6,C7
Is {TA}{135} closed?i(135)={TAWC}
![Page 10: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/10.jpg)
Closure operators
{TA}{135}={TW}{135} ->{TAW}{135}
{D}{2456}⊂{C}{123456}->{DC}{2456}
{D}{2456} and {W}{12345}->{DW}{245}
D
2
4
5
6
A
1
3
4
5
C
1
2
3
4
5
6
T
1
3
5
6
W
1
2
3
4
5Based on CHARM (Zaki, 2005)
![Page 11: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/11.jpg)
Visual analytics
-combination of information visualization with machine learning and data analysis (Keim et al., 2008)
Visualization of concept lattice
- provides overview of the structure of the domain - means for further data analysis, e.g., classification, clustering, implication discovery, rule
learning
Previous work
- lattice navigation since Godin et al. (1993)
-Browsable concept lattice, e.g., Kim & Compton (2004)
Our current work
- on augmenting concept lattice by integrating multiple sources of knowledge (Gene Ontology, protein interactions) for further analysis & machine learning
Concept lattice as a visual analytics approach
![Page 12: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/12.jpg)
Case study: Yeast systems biology
![Page 13: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/13.jpg)
Browsable concept lattice
more general
![Page 14: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/14.jpg)
Biological validation (1) : synthetic lethality
Synthetic lethal interactionif cell is viable when either gene A or B are individually deleted, but cannot grow when both are deleted.
Our results show that 72 (119) concepts in the lattice more likely than random chance at p < 0.01 (p < 0.05) to contain synthetic lethal pairs.
![Page 15: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/15.jpg)
Protein-protein interaction data
Microarray gene-expression data
Transcription factor binding data (ChIP-chip)
Ontology data
Biochemical pathway data
Inductive Logic
Programming
concept(A):- ppi(B,A,C), ppi(B,A,E), ppi(B,C,E)tfbinds(D,C),fbinds(F,E)
First-order rule
Biological validation (2) : ILP learning of concept definitions
![Page 16: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/16.jpg)
Transcription factors
RSM19 required for H2O2 response; RSM19, RSM22 and MRPS17 in “mitochondrial ribosomal small subunit” stable complex; and RSM22, MRPS17 bound by transcription factors under amino acid starvation.
Example rule:
![Page 17: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/17.jpg)
Conclusions
Many real-world domains are data-intensive
Machine learning and data mining applications required to generate predictive and useful outputs
We focus on knowledge-based learning for comprehensibility – use ontologies
Formal concept analysis as a framework for ontology structure
Use data mining techniques for efficient concept lattice generation
Visual analytics approach: browsable lattice, added background knowledge
Initial validation on a case study from yeast systems biology
![Page 18: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike](https://reader036.vdocuments.net/reader036/viewer/2022062712/56649c775503460f9492c81c/html5/thumbnails/18.jpg)
Investigate pseudo-intents to simplify concept lattice
Investigate variants of concept lattice structures-e.g., concept lattice of inverse context
Add concept definitions to background knowledge in ILP
Future work