building pipeline-based nlp systems for your applications€¦ · building pipeline-based nlp...
TRANSCRIPT
![Page 1: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/1.jpg)
Building pipeline-based NLP systems for your applications
Hua Xu
School of Biomedical Informatics, University of Texas Health Science Center at Houston
1
![Page 2: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/2.jpg)
Disclosure
• Ireceivegrantfundingfrom:– NIH:NLM,NIGMS,NCI– CPRIT(CancerPreven?onandResearchIns?tuteofTexas)
• Ihavebeenaconsultantfor:– HebtaLLC
2
![Page 3: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/3.jpg)
What Is NLP? • Broad Definition – any system that
manipulates text or speech. It could involve various degrees of linguistic knowledge.
• NLP Systems – Natural Language understanding – Natural Language extraction – Natural Language generation – Machine translation – NLP-based information retrieval – NLP-interfaces
3
![Page 4: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/4.jpg)
Study of Natural Language
• Humanlanguage(vs.formalandcomputerlanguage)
• Linguis?cs-adescrip?onoflanguage-usedbytheore?callinguists.
• Psycholinguis?cs-acogni?vemodelofhowpeopleunderstandandgeneratelanguage.
• Computa?onallinguis?cs-buildcomputa?onalmodelstounderstandandgeneratelanguage.
4
![Page 5: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/5.jpg)
Computa)onalLinguis)cs
♦ An interdisciplinary field dealing with the statistical and/or rule-based modeling of natural language from a computational perspective – Driven by need to process natural language –
convert to structured form for further computerized processes
– Computational model is not necessarily same as human model - we don’t understand much about human language facility
5
![Page 6: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/6.jpg)
Overview of Linguistic Levels • Phonology: units of sound combine to produce
words (will not cover) • Morphology: basic units combine to produce
words • Lexicography: syntactic (part of speech) and
semantic categories of words • Syntax: structures combine to produce
sentences • Semantics: meaning/interpretations • Discourse – previous information affects the
interpretation of the current information • Pragmatic: context or world knowledge affects
the interpretation of meaning
6
![Page 7: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/7.jpg)
Morphology
• Defini?on:Thestudyofhowwordsarecomposedfromsmaller,meaning-bearingunits(morphemes)§ Inflec?on:Wordstem+gramma?calmorpheme
○ likeàlikes,liked,liking§ Deriva?on:Wordstem+syntac?c/gramma?calmorpheme○ generalizeàgeneraliza?on
§ Compounding:Twobaseformsjointoformanewword○ bed?me
• Applica?on:spellingcheck,stemming,POStagging,speechrecogni?on
7
![Page 8: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/8.jpg)
Lexicography-Words
♦ Recognize word – Tokenization (determine the word boundary)
♦ Identify word – Lookup (map to dictionary entry)
♦ Categorize word – Tagging – Syntactic – Assign Part-of-Speech Tags – Semantic – Assign semantic categories
8
![Page 9: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/9.jpg)
Syntax-Sentences♦ Definition: study of the structure of a
sentence. – Categories combine with others to produce a well-formed
structure with underlying relations ♦ Difficulties: ambiguous, nesting, omitted
structures – pain in (hands and feet) vs. (pain in hands) and fever
♦ Parsing – determining syntax - Formalisms: regular expressions vs. context-free
grammar - Partial vs. full parsing
9
![Page 10: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/10.jpg)
Seman)cs♦ Lexical level – to determine the meaning of
a word ♦ Semantic categories of a word
• Abdomen – body location • Fever – symptom • pt – labtest (prothrombin?meassay) vs. treatment
(physical therapy) ♦ Word sense disambiguation
♦ Grammatical level - word senses in a structure combine to form a meaning of the whole structure
10
![Page 11: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/11.jpg)
Discourse
♦ Previous information in text affects current text – Correct reference for pronouns, definite noun
phrases, bridging noun phrases. • Mass noted in left upper lobe. It was well-
marginated. – Time of events – Determining topic – Coherence of text
11
![Page 12: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/12.jpg)
Pragma)cs
♦ Context affect meaning – Domain: A mass was observed – Section of Report: past history vs. hospital
course – Prior information
♦ World knowledge affects interpretation - He couldn’t do any trading on the past
Monday. (Market was closed on President Day - Monday.)
12
![Page 13: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/13.jpg)
It’s all about Ambiguity!• POStagging-saw (noun vs. verb) • Semantic tagging - pt (patient, physical therapy, prothrombin?meassay) • Syntactic parsing - The patient had pain in lower extremities. vs.
The patient had pain in emergency room.
13
S
np vp
det the
n patient
v had
np
n pain
pp
p in
np
adj lower
n extremities
S
np vp
det the
n patient
v had
n pain
pp
p in
np
n emergency
n room
![Page 14: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/14.jpg)
Most of current clinical NLP systems are information extraction systems
• General-purpose – MedLEE – MetaMap – cTAKES – KnowledgeMap Concept Identifier – ….
• Specific-purpose – MIST – the MITRE identification scrubber toolkit – MedEx – medication information extraction – ……
14
![Page 15: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/15.jpg)
Pipeline-based architecture
15
cTAKES(clinicalTextAnalysisandKnowledgeExtrac?onSystem)UIMA(UnstructuredInforma?onManagementArchitecture)annota?onflowofsideeffectpipeline.
Source:Sohnetal.JAMIA,2011
![Page 16: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/16.jpg)
Demo of building clinical NLP pipelines using CLAMP
• ClinicalLanguageAnnota?on,Modeling,andProcessingToolkit(CLAMP)
• Demo1–determinesmokingstatususingrule-basedapproaches
• Demo2–extractlabnamesusingahybridapproachthatcombinesmachinelearningandrules
16
![Page 17: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/17.jpg)
Introduction to CLAMP• AgeneralpurposeclinicalNLPsystembuiltonproven
methods
• AnIDE(integrateddevelopmentenvironment)forbuilding
customizedclinicalNLPpipelinesviaGUIs– Annota?ng/analyzingclinicaltext– TrainingofML-basedmodules– Specifyingrule
17
NLPTasks Ranking
Nameden?tyrecogni?on
2009i2b2,medica?on #2
2010i2b2problem,treatment,test #2
2013SHARe/CLEFabbrevia?on #1
UMLSencoding 2014SemEval,disorder #1
Rela?onextrac?on
2012i2b2Temporal #1
2015SemEvalDisease-modifier #1
2015BioCREATIVEChemical-induceddisease #1
![Page 18: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/18.jpg)
What does CLAMP address?
• TheTransportabilityProblemofNLP– Fromonetypeofclinicalnotestoanother– Fromoneins?tutetoanother– Fromoneapplica?ontoanother
• Needasolu?onfornon-NLPexpertstoefficientlybuildhigh-performanceNLPmodulesforindividualapplica?ons!
18
![Page 19: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/19.jpg)
CLAMP Demo 1
• Buildarule-basedsystemtoextractsmokingstatusfromclinicaltext
• Input:sentencescontainingpa?entsmokinginforma?on
• Output:threetypesofstatusforeachsmokingmen?on:– CurrentSmoker:Shehasapriorhistoryofsmokingalthoughnotcurrently
– PastSmoker:Sheiscon?nuingtosmoke– Non-Smoker:Shedeniesanytobaccouse,alcoholuse
19
![Page 20: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/20.jpg)
CLAMP Demo 2
• Buildahybrid(machinelearning+rules)systemforextrac?nglabtestconceptsfromclinicaltext
• Input:dischargesummaries• Output:labtestconceptsmen?onedinthetextwithakributesof:– Offsets– Nega?on– UMLSCUIs
20
![Page 21: Building pipeline-based NLP systems for your applications€¦ · Building pipeline-based NLP systems for your applications Hua Xu School of Biomedical Informatics, University of](https://reader033.vdocuments.net/reader033/viewer/2022060522/60510c7d2ad94c7b5d5ff951/html5/thumbnails/21.jpg)
CLAMP Availability
• CLAMPisavailableintwoversions:– CLAMPCMD(free)– CLAMPGUI(dependsonthelicense)hkps://sbmi.uth.edu/ccb/resources/clamp.htm
• Itisnotanopensourcesolware,butsourcecodesareavailableforcollaboratorswithappropriatelicenses.
• Wearelookingforcollaboratorstoco-developthesystem!Ifinterested,pleasecontact:[email protected]