pedersen naacl-2013-demo-poster-may25

1
Poster Design & Printing by Genigraphics ® - 800.790.4001 Ted Pedersen Department of Computer Science University of Minnesota, Duluth [email protected] http://www.d.umn.edu/~tpederse UMLS::Similarity is freely available open source software that allows a user to measure the semantic similarity or relatedness of biomedical terms found in the Unified Medical Language Systems (UMLS). It is written in Perl and can be used via a command line interface, an API, or a Web interface. UMLS::Similarity has been modeled after and inspired by WordNet::Similarity (and yes, we've even used some code). But, it has evolved to a point where it is certainly more than a clone and has its own very distinctive identity. The development of UMLS::Similarity was supported in part by an RO1 grant from the National Institutes of Health (USA), National Library of Medicine (#1R01LM009623-01A2). What are we measuring, and why? Similarity Depends on IS-A hierarchy Acknowledgments Using UMLS::Similarity Abstract Contact UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota http://umls-similarity.sourceforge.net Unified Medical Language System To be similar is to be alike, how much is X like Y? Similar concepts share ancestors in is-a hierarchy, the deeper the ancestor the more similar LCS : least common subsumer Tetanus and strep throat are similar, since both are kinds of bacterial infections The ability to organize concepts by their similarity or relatedness to each other is a fundamental operation in the human mind, and to many problems in Natural Language Processing and Artificial Intelligence UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota http://umls-similarity.sourceforge.net UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota http://umls-similarity.sourceforge.net UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota http://umls-similarity.sourceforge.net Relatedness Relies on Definitions Assign a numeric value that quantifies how similar or related two concepts or senses are, not words Cold may be temperature or illness To be related is much more general, since there are many ways to be related is-a, part-of, treats, symptom-of, ... Tetanus and deep cuts are related but they really aren't similar (deep cuts can cause tetanus though) Related words often defined using the same ore similar words, look for overlaps Web Interface Allows for all measures to be computed using a subset of possible sources http://atlas.ahc.umn.edu http://maraca.d.umn.edu Command Line Supports all measures, all UMLS sources plus many additional functions (many from UMLS::Interface), examples include : GetChildren GetParents GetRelated GetSemanticGroup FindCuiDepth FindPathtoRoot findLeastCommonSubsumer Semantic Similarity Measures Path based Shortest Path (path, cdist) Depth based Leacock & Chodorow (lch) Zhong et al. (zhong) Nguyen & Al-Mubaid (nam) Information Content Resnik (res) Lin (lin) Jiang & Conrath (jcn) Relatedness Measures Path Based Hirst & St-Onge (hso) Definition Based Lesk (lesk) Adapted Lesk (lesk) Definition + Corpus Gloss Vector (vector) The UMLS is a date warehouse distributed by the National Library of Medicine (twice a year) It includes more than 100 terminologies, code sets, and ontologies encompassing many different areas of medical knowledge. A user can access individual sources (examples below) or view them as one large combined resource via the MetaThesaurus. MeSH – medical subject headings, used for indexing articles in PubMed FMA – Foundational Model of Anatomy, a very fine grained ontology of human anatomy OMIM – Online Mendelian Inheritance in Man, catalog of genes and gene disorders SNOMEDCT – Systematized Nomenclature of Medicine – Clinical Terms Word Sense Disambiguation with UMLS::SenseRelate We can measure senses, or we can use the measures to identify senses! http://search.cpan/org/dist/UMLS-SenseRelate

Upload: university-of-minnesota-duluth

Post on 11-May-2015

168 views

Category:

Documents


1 download

DESCRIPTION

Poster associated with UMLS::Similarity demo at NAACL 2013.

TRANSCRIPT

Page 1: Pedersen naacl-2013-demo-poster-may25

Poster Design & Printing by Genigraphics® - 800.790.4001

Ted PedersenDepartment of Computer ScienceUniversity of Minnesota, Duluth

[email protected]://www.d.umn.edu/~tpederse

UMLS::Similarity is freely available open source

software that allows a user to measure the semantic

similarity or relatedness of biomedical terms found in the

Unified Medical Language Systems (UMLS). It is written in Perl and can be used via a command line interface, an

API, or a Web interface.

UMLS::Similarity has been modeled after and inspired by WordNet::Similarity (and yes, we've even used some code). But, it has evolved to a point where it is certainly more than a clone and has its own very distinctive identity. The development of UMLS::Similarity was supported in part by an RO1 grant from the National Institutes of Health (USA), National Library of Medicine (#1R01LM009623-01A2).

What are we measuring, and why?

Similarity Depends on IS-A hierarchy

Acknowledgments

Using UMLS::SimilarityAbstract

Contact

UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute

Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net

Unified Medical Language System

To be similar is to be alike, how much is X like Y? Similar concepts share ancestors in is-a hierarchy, the deeper the ancestor the more similar• LCS : least common subsumer●Tetanus and strep throat are similar, since both are kinds of bacterial infections

The ability to organize concepts by their similarity or relatedness to each other is a fundamental operation in the human mind, and to many problems in Natural Language Processing and Artificial Intelligence

UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute

Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net

UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute

Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net

UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute

Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net

Relatedness Relies on Definitions

Assign a numeric value that quantifies how similar or related two concepts or senses are, not words

Cold may be temperature or illness

To be related is much more general, since there are many ways to be related is-a, part-of, treats, symptom-of, ...●Tetanus and deep cuts are related but they really aren't similar (deep cuts can cause tetanus though)●Related words often defined using the same ore similar words, look for overlaps

Web Interface

• Allows for all measures to be computed using a subset of possible sources

•http://atlas.ahc.umn.edu•http://maraca.d.umn.edu

Command Line

• Supports all measures, all UMLS sources plus many additional functions (many from UMLS::Interface), examples include :

•GetChildren•GetParents•GetRelated•GetSemanticGroup•FindCuiDepth•FindPathtoRoot•findLeastCommonSubsumer

Semantic Similarity Measures

Path basedShortest Path (path, cdist)

Depth basedLeacock & Chodorow (lch)Zhong et al. (zhong)Nguyen & Al-Mubaid (nam)

Information ContentResnik (res)Lin (lin)Jiang & Conrath (jcn)

Relatedness Measures

Path BasedHirst & St-Onge (hso)

Definition BasedLesk (lesk)Adapted Lesk (lesk)

Definition + CorpusGloss Vector (vector)

The UMLS is a date warehouse distributed by the National Library of Medicine (twice a year)

It includes more than 100 terminologies, code sets, and ontologies encompassing many different areas of medical knowledge. A user can access individual sources (examples below) or view them as one large combined resource via the MetaThesaurus.

MeSH – medical subject headings, used for indexing articles in PubMed

FMA – Foundational Model of Anatomy, a very fine grained ontology of human anatomy

OMIM – Online Mendelian Inheritance in Man, catalog of genes and gene disorders

SNOMEDCT – Systematized Nomenclature of Medicine – Clinical Terms

Word Sense Disambiguation with UMLS::SenseRelate

We can measure senses, or we can use the measures to identify senses!

http://search.cpan/org/dist/UMLS-SenseRelate