annual review, brussels march 17, 2005 semanticmining no.507505 annual review noe no. 507505...

26
Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Upload: shona-bradley

Post on 13-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Annual Review

NoE No. 507505Semantic Interoperability and Data Mining in

Biomedicine[SemanticMining]

Page 2: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Outline

• Workshop on Natural Language Processing (D13.1)

• Multi-lingual medical dictionary (D20.1)

• Information Retrieval and Data Mining (D24.1)

Page 3: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

WP13: Workshop on Natural Language Processing

• Goals:1. expand visibility of the semanticmining workshop;

2. establish forum for outside/inside network cooperation;

3. federate the NLP community in the biomedical domain;

4. organize a shared task to stimulate research in the domain, following well established challenges such as the TREC Genomics (http://trec.nist.gov/) or BioCreative(http://www.pdg.cnb.uam.es/BioLINK/BioCreative.eval.html).

Page 4: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Workshop

• Audience– Satellite of COLING: computer scientists, linguists, logicians…– Natural Language Processing/Information Retrieval– Medical informatics and Bioinformatics– 60 registered participants

• Distribution– Table

• Paper selection– 7 regular papers out of 30 submissions– 5 posters

• Dissemination– Workshop printed proceedings– Website– Special issue under preparation (IJMI - Elsevier)

Page 5: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Shared Task I

• Background– Information access tools is increasing to support

literature survey,

– Online ‘portals’ where scientists can navigate

– Genetics and disease databases

– Ambiguous nomenclature: Gene/RNA/proteins

– Scale up methods for processing full text articles etc.

• Task– Annotate Gene and Protein Names (GPNs)

i.e. find beginning and end of GPNs

Page 6: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Shared task II

• MEDLINE Corpus

Trained on 2000 abstracts / Tested on 200

• Evaluation

IOB recall and precision-like metrics

• Participation– 12 participant team

Page 7: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Evaluation

• Criterion Q3: Valorisation and Dissemination

Satisfying but internal impact could be improved

Page 8: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Natural Language Processing Workshop

2005

Page 9: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

SMBM 2005

• Symposium on Semantic Mining in Biomedicine:EBI, Hinxton, UK, 10-13 April, 2005

– 28 submissions

– 12 accepted papers

– 4 invited speakers

– 4 Tutorials

– Up to now about 60 registrations

http://www.ebi.ac.uk/Information/events/SMBM/

Page 10: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

WP20: Multilingual Lexicon

Three lines of work:

• MorphoSaurus subword lexicon: Links minimal, semantically atomic lexical units in 6 languages (approx. 80,000 entries, 27,000 equivalence classes). Purpose: Cross-language text retrieval, semantic interface between medical dictionaries

• Semi automated lexical acquisition: generating Spanish subwords out of Portuguese subwords, and Swedish out of German and English ones.

• Common Lexicon Interchange Format

Based on the (EU-funded) MULTEXT morpho-syntactic description. Facilitates the re-use of lexical resources

Page 11: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Evaluation

• Q2. Sharing of resources and use of research software tools

Satisfying• Q6. Short and medium-term visits

To be improved• Q7. Co-authoring of research papers, PhD…

To be improved

Page 12: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Multilingual Lexicon

2005

Page 13: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

2005 : Multilingual Lexicon

• MorphoEdit lexicon editor

• MorphoSaurus segmenter & indexer

• Exchange of French and Swedish lexemes

• Catholic University of Paraná, Brazil– Lexeme acquisition

– EHR indexing and retrieval

Sharing Tools and Resources

International Cooperation Fund Raising

Dissemination and Standards activities

• Standardization of Lexicon Interchange Format

• Negotiations on Semantic Medical document indexing with private and public partners in Germany

• 1 IST call 4 proposal

Page 14: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

WP24: Information Retrieval and Data Mining

- Semantic Interoperability- Normalized vocabulary (Gene Ontology, MeSH…)- Online integration tool:

http://www.ebi.ac.uk/Rebholz-srv/whatizit/form.jsp- Information Retrieval and Extraction

- Gene and Proteins, Drugs…- Protein Functions: apoptosis-induction…- Cellular Components: membrane, mitochondria..- Biological Processes: digestion, reproduction…

- Knowledge coupling- Uni-Prot (EU), MGI, LocusLink (US)

via Sequence Retrieval System Need new Tools for Images and Full-text articles !

Page 15: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Entity Types

Page 16: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Whatizit !

Page 17: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Biomedical Text (MEDLINE Abstract)

Alterations in protein folding and the regulation of conformational states have become increasingly important to the functionality of key molecules in signaling, cell growth, and cell death. Molecular chaperones, because of their properties in protein quality control, afford conformational flexibility to proteins and serve to integrate stress-signaling events that influence aging and a range of diseases including cancer, cystic fibrosis, amyloidoses, and neurodegenerative diseases. We describe here characteristics of celastrol, a quinone methide triterpene and an active component from Chinese herbal medicine identified in a screen of bioactive small molecules that activates the human heat shock response. From a structure/function examination, the celastrol structure is remarkably specific and activates heat shock transcription factor 1 (HSF1) with kinetics similar to those of heat stress, as determined by the induction of HSF1 DNA binding, hyperphosphorylation of HSF1, and expression of chaperone genes. Celastrol can activate heat shock gene transcription synergistically with other stresses and exhibits cytoprotection against subsequent exposures to other forms of lethal cell stress. These results suggest that celastrols exhibit promise as a new class of pharmacologically active regulators of the heat shock response.

Page 18: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Ontology-driven Knowledge Coupling (GO)

Alterations in protein folding and the regulation of conformational states have become increasingly important to the functionality of key molecules in signaling, cell growth, and cell death . Molecular chaperones, because of their properties in protein quality control, afford conformational flexibility to proteins and serve to integrate stress-signaling events that influence aging and a range of diseases including cancer, cystic fibrosis, amyloidoses, and neurodegenerative diseases . We describe here characteristics of celastrol, a quinone methide triterpene and an active component from Chinese herbal medicine identified in a screen of bioactive small molecules that activates the human heat shock response . From a structure/function examination, the celastrol structure is remarkably specific and activates heat shock transcription factor 1 (HSF1) with kinetics similar to those of heat stress, as determined by the induction of HSF1 DNA binding, hyperphosphorylation of HSF1, and expression of chaperone genes . Celastrol can activate heat shock gene transcription synergistically with other stresses and exhibits cytoprotection against subsequent exposures to other forms of lethal cell stress . These results suggest that celastrols exhibit promise as a new class of pharmacologically active regulators of the heat shock response .

Page 19: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Gene Ontology Browser

Page 20: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Database-driven Knowledge Coupling (Swiss-Prot)

Alterations in protein folding and the regulation of conformational states have become increasingly important to the functionality of key molecules in signaling, cell growth, and cell death . Molecular chaperones, because of their properties in protein quality control, afford conformational flexibility to proteins and serve to integrate stress-signaling events that influence aging and a range of diseases including cancer, cystic fibrosis, amyloidoses, and neurodegenerative diseases . We describe here characteristics of celastrol, a quinone methide triterpene and an active component from Chinese herbal medicine identified in a screen of bioactive small molecules that activates the human heat shock response . From a structure/function examination, the celastrol structure is remarkably specific and activates heat shock transcription factor 1 (HSF1) with kinetics similar to those of heat stress, as determined by the induction of HSF1 DNA binding, hyperphosphorylation of HSF1, and expression of chaperone genes . Celastrol can activate heat shock gene transcription synergistically with other stresses and exhibits cytoprotection against subsequent exposures to other forms of lethal cell stress . These results suggest that celastrols exhibit promise as a new class of pharmacologically active regulators of the heat shock response .

Page 21: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Swiss-Prot Records

Page 22: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Evaluation

• Q2. Sharing of resources and use of research software tools

Good• Q6. Short and medium-term visits

To be improved• Q7. Co-authoring of research papers, PhD…

To be improved

Page 23: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Data Mining and Information Retrieval

2005

Page 24: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

2005 : Information Retrieval and Data Mining

• Whatizit!ImagesFull-textCitations

• Summer School 2005• Joint Publications• PhD student exchange

• Oregon Health Science University (NSF-funded)– Image + Text Retrieval– ImageCLEF challenge– E-Challenge conference

• SMBM workshop– 3 days incl. tutorials– 12 papers out of 28– Special Issue in Bioinformatics

• EAGL (Swiss-funded)Question-Answering

• 2 IST Call 4 proposals

Sharing Tools and Resources

International Cooperation

Fund Raising

Dissemination and Standards activities

Page 25: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

Distribution

China 2 (Hong-Kong + Beijing)

Corea 1

European Organization 1

Finland 1

France 1

Germany 1

Japan 2

United Kingdom 1

Uunited States of America 2

Spain 1

Singapour 3

Switzerland 1

Asia: 8Europe: 6N.A: 2

[]

Page 26: Annual Review, Brussels March 17, 2005 SemanticMining No.507505 Annual Review NoE No. 507505 Semantic Interoperability and Data Mining in Biomedicine [SemanticMining]

Annual Review, Brussels March 17, 2005 SemanticMining No.507505

B-X=Beginning of X, O=Non-Entity X, I-X=End of X

TAR RNA X =RNA, DNA, proteins, cell-type

independent O transactivation O by    O Tat    B-protein in    O cells    O derived    O from    O the    O CNS    B-cell_type a    O novel    O mechanism    O of    O HIV-1    B-DNA gene    I-DNA regulation    O []