nci thesaurus and enterprise vocabulary services: resources for cancer research lawrence w. wright...
TRANSCRIPT
NCI Thesaurus and
Enterprise Vocabulary Services: Resources for Cancer Research
Lawrence W. WrightProgram Manager
NCI Enterprise Vocabulary Serviceshttp://evs.nci.nih.gov/
May 13, 2015
2
EVS Purpose and Scope
EVS provides terminology and ontology services to support NCI and cancer researchers, and has found many shared interests and strong partnerships in the broader biomedical community. - Encode Precise, Stable Meanings:
• Support best-practice, science-based, quick-response terminology/ontology resources to help researchers accurately collect, code, and analyze data.
- Support Semantic Infrastructures:• Support metadata, models, value sets, and mappings that provide broader,
computable representations to structure meanings and make them interoperable.
- Build Shared Standards:• Partner and harmonize with other ICs, agencies, SDOs, and researchers in creating
and improving shared standards for increasingly international, cross-cutting research.
- Promote Open Content and Tools:• Promote open access, open source content and tools to lower barriers, share burdens,
and build shared resources.
3
NCI Thesaurus (NCIt):Semantic Backbone for Research Information
• Terminology: Provide best-practice coding as needed in all relevant domains.- Cancers and other diseases, findings and abnormalities.- Clinical & research trials/studies, procedures, tools, management, etc.- Agents, regimens, chemicals, nutrients, nanoparticles, & other substances.- Anatomy, tissues, subcellular structures.- Genes, gene products, pathways, biological processes.- Animal models – mouse, rat, zebrafish, other.- Concepts, properties, qualifiers, administrative & other misc. terminology.
• Ontology: Deep & precise representation of key research and health concepts.- Neoplasms
• 8,000+ concepts defined using 200,000 description logic relationships plus text definitions.• Tracks latest molecular, pathological, and clinical classifications.
- Drugs• 17,000+ individual agents & related substances, including nutritional.• 3,400+ agent combinations being extended to cover specific regimens.
- Molecular• 16,000+ genes, gene products, pathways, and abnormalities.
- Anatomy • 6,750+ concepts including systems, structures, tissues, and an extensive microanatomy.• Federal Consolidated Health Informatics (CHI) standard.
4
NCIt Example: Lymphoma (1 of 5)
Concept Code
Links to caDSR and NCIm
NCI Preferred Term NCI Definition
5
NCIt Example: Lymphoma (2 of 5)
Term Source
Tagged C3208 stakeholders (incl. Contributing_Source): CTEP, CTRP, PDQ, TCGA, NICHD, CDISC, FDA
Term Code or Subsource
6
NCIt Example: Lymphoma (3 of 5)
Relationships 1: Parent and Child Concepts
7
NCIt Example: Lymphoma (4 of 5)
Relationships 2: Role Relationships & Subset Associations
Role Relationships
Associations
8
NCIt Example: Lymphoma (5 of 5)Activated B-Cell-Like Diffuse Large B-Cell Lymphoma
Preferred Name: Activated B-Cell-Like Diffuse Large B-Cell LymphomaCode: C36081Semantic Type: Neoplastic ProcessParent Concepts: Diffuse Large B-Cell Lymphoma by Gene Expression Profile
Aggressive Non-Hodgkin Lymphoma
Definition: A biologic subset of diffuse large B-cell lymphomas with a unique molecular signature or expression profile. It represents approximately 30% of diffuse large B-cell lymphomas, and is characterized by the expression of CD44, PKCbeta1, Cyclin D2, BCL-2, and IRF4/MUM1 genes. Morphologically, these lymphomas are either centroblastic or immunoblastic (ratio 2:1). Patients with this type of diffuse large B-cell lymphoma are reported to have a less favorable outcome compared to those with a germinal center B-cell expression profile, with a 5-year survival rate of 35% and a median survival of 2 years.
* Partial List
9
NCIt Drugs (1 of 2)
10
NCIt Drugs (2 of 2)
11
NCIt Biomarker Typeswith example concepts
• Molecular/Genetic Markers BRCA1 Gene; BRAF NP_004324.2:p.V600X (BRAF V600 Mutation); t(11;18)
(q21;q21); BCR/ABL1 Fusion Protein p230, N-Telopeptide• Laboratory Test Results
Estrogen Receptor Status; Methemoglobin Reductase Deficiency; CD34-Positive Neoplastic Cells Present; HMB-45-Positive Neoplastic Cells Present
• Histology/Pathology Findings Positive Surgical Margin; Blast Cells Present in Peripheral Blood; Ductal
Carcinoma Cell; Cervical Glandular Dysplasia; Atypical Mitotic Figures• Antigens and Metabolic Markers
Ganglioside GM2; CD15; 2-Methoxyestradiol; 4-Hydroxyestrone; N(6)-Carboxymethyllysine; 8-Oxoguanine
• Physiological and Pathological Processes DNA Methylation, Tumor Angiogenesis, Oxidative Stress, Lipid Peroxidation, S-
Nitrosylation; Histone Acetylation
12
Partnered NCIt subsetshttp://ncit.nci.nih.gov/ncitbrowser/pages/subset.jsf
13
Partnered NCIt subsets (1 of 550):FDA SPL Drug Route of Administration Terminology
14
Cross Map by MeaningNCI Metathesaurus https://ncim.nci.nih.gov
Definitions
Terms & Sources
15
NCI Hosted Mappings
nciterms.nci.nih.gov/ncitbrowser/pages/mapping_search.jsf?nav_type=mappings
16
NCI Unified, Open InfrastructureLexEVS Server & NCI Term Browser http://nciterms.nci.nih.gov/
22 Sources
Search
Linked Resources
3 ResourceTypes
25 / 75 Subsources
17
NCI Metadata in caDSRWidely Used in NCI & Partner Semantic Infrastructures
18
Some Lessons (1 of 2)
• Coding and representation of biomedical information will remain diverse and dynamic.
- Many ‘legacy’ systems will be widely used for a long time to come.- Their content and use can be improved in important ways.- There is a large and growing role for more innovative resources
responsive to specific research and care needs.• Responsiveness and partnerships are vital: Engage to analyze and
address needs quickly, form strategic partnerships and communities around key needs.
• Open standards encourage participation and reuse: Harmonize and share as openly and widely as possible, have expert staff to support operations.- Scale of reuse can easily exceed scale of original uses.
• Open technical standards and tools such as OWL/RDF, CTS2/REST, Protégé, LexEVS, and NCI browsers increase sharing and compatibility.
19
Some Lessons (2 of 2)
• Core best practices are vital: Stable codes for precise meanings, clear terms and synonymy, human-readable text definitions, extensive quality control, expert staff and community input.
• NCIt reference terminology provides semantic backbone for most EVS-supported coding, analyzing, and sharing research data.
• NCIt embedded partner terminology combines tighter semantics, harmonization, shared coding, and partner-appropriate terms.
• NCIm and related mappings are very useful for reference, NLP, and translation, but have weaker semantics and use.
• User driven priorities create relevance but also unevenness: EVS combines broad scope and rich ontology with some gaps and simple coding.
20
EVS Resources
Web & Wiki Pages:• EVS Web Portal: http://evs.nci.nih.gov/• EVS Wiki: https://wiki.nci.nih.gov/display/EVS/EVS+Wiki • EVS Bibliography: https://
wiki.nci.nih.gov/display/EVS/Bibliography+on+EVS+and+Its+Use • EVS Use & Collaborations: https
://wiki.nci.nih.gov/display/EVS/EVS+Use+and+Collaborations
Browsers and Term Request:• NCI Term Browser: http://nciterms.nci.nih.gov/ • NCI Thesaurus: http://ncit.nci.nih.gov/ • NCI Metathesaurus: http://ncim.nci.nih.gov/ • NCI EVS Term Request Page: http://ncitermform.nci.nih.gov/
EVS/NCIt Staff email: [email protected]