using the ncbo annotator to develop an ontology-based index of biomedical resources

1
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Resources Trish Whetzel, Paea LePendu, Clement Jonquet, Adrian Coulet, Natalya F. Noy, Mark A. Musen, Nigam H. Shah Stanford University, Stanford, CA Acknowledgements The National Center for Biomedical Ontology is one of the National Centers for Biomedical Computing supported by the NHGRI, the NHLBI, and the NIH Common Fund under grant U54-HG004028. Contact For more information on the NCBO, visit http :// www.bioontology.org or email support@ bioontology.org Abstract Biomedical researchers generate an enormous amount of data covering a variety of domains from genomic information and pathways to drug descriptions, clinical trials, and diseases. The data is published on the Web, however using idiosyncratic schemas and access mechanisms making it difficult to search and analyze the data. While the biomedical community agrees that terminologies and ontologies are essential for data integration and translational discoveries to occur, semantic annotation of biomedical resources is still minimal and is often restricted to a few ontologies. The NCBO Resource Index addresses these problems by providing a unified ontology-based index of and access to multiple heterogeneous biomedical resources. To create the Resource Index, textual metadata from selected data elements in publicly available biomedical resources is processed using a resource-specific access tool to create semantic annotations. We use ontologies from BioPortal as a source of terms, their synonyms, and relations between terms. The ontology-based index is generated using the NCBO Annotator to tag the metadata and both direct and expanded annotations are created using available ontology knowledge, e.g. term mappings and ontology hierarchy. Currently, our dictionary contains 5,670,771 terms from 294 ontologies and has been used to index 3,920,921 indexed data elements from 23 resources. The ontology-based index is accessible from the BioPortal Web site and directly via the Resource Index Web services. The web interface provides a search box that displays auto-complete suggestions for the search terms and source ontology for these terms. For each resulting resource record, the details of the annotation are highlighted in the original text and additional terms used to tag the resource record are displayed. The Resource Index Web services support access by specifying either the ontology term or the resource record. For example, given the term “melanoma” return all records across one or more resources annotated with this term or given a resource record identifier, return all annotations. In addition, search by union and intersection is also supported. Therefore, heterogeneous, independently developed resources can be searched from a universal ontology-based index. Resource Index Annotator Workflow Applications using the Annotator The NCBO Annotator uses ontologies from BioPortal, which includes content from the OBO Foundry, UMLS, and other user submitted ontologies, as the source of terms to generate a dictionary containing the preferred name and synonyms from these ontologies. The dictionary is then used with the concept recognizer Mgrep to identify these entities in text submitted to the Annotator Web service creating direct annotations. The annotations can also be expanded to include terms based on the ontology hierarchy and term mappings. GeneWiki Annotator http://en.wikipedia.org/wiki/ Portal:Gene_Wiki Statistical Tracking of Ontological Phrases http://www.mooneygroup.org/content / webtools http:// ontologicaldiscovery.o rg Ontological Discovery Environment SciVerse Hub - ODiSSea http://www.hub.sciverse.com/act ion/ home The Resource Index is an ontology-based index of publicly available biomedical data generated using the Annotator. Data can be searched from BioPortal or using the Resource Index Web services. Searching the ontology-based index returns results that otherwise would not be found. Contact us to suggest a resource. http://bioportal.bioontology.o rg/ resources

Upload: trish-whetzel

Post on 23-Jun-2015

78 views

Category:

Software


0 download

DESCRIPTION

Poster presentation from Biocuration 2012.

TRANSCRIPT

Page 1: Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Resources

Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Resources

Trish Whetzel, Paea LePendu, Clement Jonquet, Adrian Coulet, Natalya F. Noy, Mark A. Musen, Nigam H. ShahStanford University, Stanford, CA

Acknowledgements

The National Center for Biomedical Ontology is one of the National Centers for Biomedical Computing supported by the NHGRI, the NHLBI, and the NIH Common Fund under grant U54-HG004028.

Contact

For more information on the NCBO, visit http://www.bioontology.org or email [email protected]

AbstractBiomedical researchers generate an enormous amount of data covering a variety of domains from genomic information and pathways to drug descriptions, clinical trials, and diseases. The data is published on the Web, however using idiosyncratic schemas and access mechanisms making it difficult to search and analyze the data. While the biomedical community agrees that terminologies and ontologies are essential for data integration and translational discoveries to occur, semantic annotation of biomedical resources is still minimal and is often restricted to a few ontologies. The NCBO Resource Index addresses these problems by providing a unified ontology-based index of and access to multiple heterogeneous biomedical resources.

To create the Resource Index, textual metadata from selected data elements in publicly available biomedical resources is processed using a resource-specific access tool to create semantic annotations. We use ontologies from BioPortal as a source of terms, their synonyms, and relations between terms. The ontology-based index is generated using the NCBO Annotator to tag the metadata and both direct and expanded annotations are created using available ontology knowledge, e.g. term mappings and ontology hierarchy. Currently, our dictionary contains 5,670,771 terms from 294 ontologies and has been used to index 3,920,921 indexed data elements from 23 resources.

The ontology-based index is accessible from the BioPortal Web site and directly via the Resource Index Web services. The web interface provides a search box that displays auto-complete suggestions for the search terms and source ontology for these terms. For each resulting resource record, the details of the annotation are highlighted in the original text and additional terms used to tag the resource record are displayed. The Resource Index Web services support access by specifying either the ontology term or the resource record. For example, given the term “melanoma” return all records across one or more resources annotated with this term or given a resource record identifier, return all annotations. In addition, search by union and intersection is also supported. Therefore, heterogeneous, independently developed resources can be searched from a universal ontology-based index.

Resource Index

Annotator Workflow

Applications using the Annotator

The NCBO Annotator uses ontologies from BioPortal, which includes content from the OBO Foundry, UMLS, and other user submitted ontologies, as the source of terms to generate a dictionary containing the preferred name and synonyms from these ontologies. The dictionary is then used with the concept recognizer Mgrep to identify these entities in text submitted to the Annotator Web service creating direct annotations. The annotations can also be expanded to include terms based on the ontology hierarchy and term mappings.

GeneWiki

Annotator

http://en.wikipedia.org/wiki/Portal:Gene_Wiki

Statistical Tracking of Ontological Phrases

http://www.mooneygroup.org/content/webtools

http://ontologicaldiscovery.org

Ontological Discovery Environment

SciVerse Hub - ODiSSea

http://www.hub.sciverse.com/action/home

The Resource Index is an ontology-based index of publicly available biomedical data generated using the Annotator. Data can be searched from BioPortal or using the Resource Index Web services. Searching the ontology-based index returns results that otherwise would not be found. Contact us to suggest a resource.

http://bioportal.bioontology.org/resources