biomedical information retrieval · questions using an information retrieval system. bulletin of...

42
Biomedical Information Retrieval William Hersh, MD Professor and Chair Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University Portland, OR, USA Email: [email protected] Web: www.billhersh.info Blog: http://informaticsprofessor.blogspot.com Twitter: @williamhersh References Alsheikh-Ali, AA, Qureshi, W, et al. (2011). Public availability of published research data in high- impact journals. PLoS ONE. 6(9): e24357. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024357 Anonymous (2006). Fatally Flawed - Refuting the recent study on encyclopedic accuracy by the journal Nature. Chicago, IL, Encyclopedia Brittanica. http://corporate.britannica.com/britannica_nature_response.pdf Anonymous (2012). From Screen to Script: The Doctor's Digital Path to Treatment. New York, NY, Manhattan Research; Google. https://www.thinkwithgoogle.com/research-studies/the-doctors- digital-path-to-treatment.html Anonymous (2015). The Beginner's Guide to SEO. Seattle, WA, Moz. http://moz.com/beginners- guide-to-seo Anonymous (2016). Toward fairness in data sharing. New England Journal of Medicine. 375: 405- 407. Anonymous (2017). Database Resources of the National Center for Biotechnology Information. Nucleic Acids Research. 45: D12-D17. Bachrach, CA and Charen, T (1978). Selection of MEDLINE contents, the development of its thesaurus, and the indexing process. Medical Informatics. 3: 237-254. Bastian, H, Glasziou, P, et al. (2010). Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Medicine. 7(9): e1000326. http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000326 Brin, S and Page, L (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 30: 107-117. http://infolab.stanford.edu/pub/papers/google.pdf Broder, A (2002). A taxonomy of Web search. SIGIR Forum. 36(2): 3-10. http://www.acm.org/sigir/forum/F2002/broder.pdf Castillo, C and Davison, BD (2011). Adversarial Web Search. Delft, Netherlands, now Publishers. Cerrato, P (2012). IBM Watson Finally Graduates Medical School. Information Week, October 23, 2012. http://www.informationweek.com/healthcare/clinical-systems/ibm-watson-finally- graduates-medical-sch/240009562 Coletti, MH and Bleich, HL (2001). Medical subject headings used to search the biomedical literature. Journal of the American Medical Informatics Association. 8: 317-323. Davies, K (2006). Search and Deploy. Bio-IT World, October 16, 2006. http://www.bio- itworld.com/issues/2006/oct/biogen-idec/ DeAngelis, CD, Drazen, JM, et al. (2005). Is this clinical trial fully registered? A statement from the International Committee of Medical Journal Editors. Journal of the American Medical Association. 293: 2927-2929.

Upload: others

Post on 18-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

BiomedicalInformationRetrieval

WilliamHersh,MDProfessorandChair

DepartmentofMedicalInformatics&ClinicalEpidemiologyOregonHealth&ScienceUniversity

Portland,OR,USAEmail:[email protected]:www.billhersh.info

Blog:http://informaticsprofessor.blogspot.comTwitter:@williamhersh

ReferencesAlsheikh-Ali,AA,Qureshi,W,etal.(2011).Publicavailabilityofpublishedresearchdatainhigh-impactjournals.PLoSONE.6(9):e24357.http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024357Anonymous(2006).FatallyFlawed-RefutingtherecentstudyonencyclopedicaccuracybythejournalNature.Chicago,IL,EncyclopediaBrittanica.http://corporate.britannica.com/britannica_nature_response.pdfAnonymous(2012).FromScreentoScript:TheDoctor'sDigitalPathtoTreatment.NewYork,NY,ManhattanResearch;Google.https://www.thinkwithgoogle.com/research-studies/the-doctors-digital-path-to-treatment.htmlAnonymous(2015).TheBeginner'sGuidetoSEO.Seattle,WA,Moz.http://moz.com/beginners-guide-to-seoAnonymous(2016).Towardfairnessindatasharing.NewEnglandJournalofMedicine.375:405-407.Anonymous(2017).DatabaseResourcesoftheNationalCenterforBiotechnologyInformation.NucleicAcidsResearch.45:D12-D17.Bachrach,CAandCharen,T(1978).SelectionofMEDLINEcontents,thedevelopmentofitsthesaurus,andtheindexingprocess.MedicalInformatics.3:237-254.Bastian,H,Glasziou,P,etal.(2010).Seventy-fivetrialsandelevensystematicreviewsaday:howwillweeverkeepup?PLoSMedicine.7(9):e1000326.http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000326Brin,SandPage,L(1998).Theanatomyofalarge-scalehypertextualWebsearchengine.ComputerNetworksandISDNSystems.30:107-117.http://infolab.stanford.edu/pub/papers/google.pdfBroder,A(2002).AtaxonomyofWebsearch.SIGIRForum.36(2):3-10.http://www.acm.org/sigir/forum/F2002/broder.pdfCastillo,CandDavison,BD(2011).AdversarialWebSearch.Delft,Netherlands,nowPublishers.Cerrato,P(2012).IBMWatsonFinallyGraduatesMedicalSchool.InformationWeek,October23,2012.http://www.informationweek.com/healthcare/clinical-systems/ibm-watson-finally-graduates-medical-sch/240009562Coletti,MHandBleich,HL(2001).Medicalsubjectheadingsusedtosearchthebiomedicalliterature.JournaloftheAmericanMedicalInformaticsAssociation.8:317-323.Davies,K(2006).SearchandDeploy.Bio-ITWorld,October16,2006.http://www.bio-itworld.com/issues/2006/oct/biogen-idec/DeAngelis,CD,Drazen,JM,etal.(2005).Isthisclinicaltrialfullyregistered?AstatementfromtheInternationalCommitteeofMedicalJournalEditors.JournaloftheAmericanMedicalAssociation.293:2927-2929.

Page 2: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

Ferrucci,D,Brown,E,etal.(2010).BuildingWatson:anoverviewoftheDeepQAProject.AIMagazine.31(3):59-79.http://www.aaai.org/ojs/index.php/aimagazine/article/view/2303Ferrucci,DA(2012).Introductionto"ThisisWatson".IBMJournalofResearchandDevelopment.56(3/4):1.http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6177724Fox,S(2011).HealthTopics.Washington,DC,PewInternet&AmericanLifeProject.http://www.pewinternet.org/Reports/2011/HealthTopics.aspxFox,S(2011).TheSocialLifeofHealthInformation,2011.Washington,DC,PewInternet&AmericanLifeProject.http://www.pewinternet.org/Reports/2011/Social-Life-of-Health-Info.aspxFox,SandDuggan,M(2013).HealthOnline2013.Washington,DC,PewInternet&AmericanLifeProject.http://www.pewinternet.org/Reports/2013/Health-online.aspxFunk,MEandReid,CA(1983).IndexingconsistencyinMEDLINE.BulletinoftheMedicalLibraryAssociation.71:176-183.Giles,J(2005).Internetencyclopaediasgoheadtohead.Nature.438:900-901.http://www.nature.com/nature/journal/v438/n7070/full/438900a.htmlGorman,PN(1995).Informationneedsofphysicians.JournaloftheAmericanSocietyforInformationScience.46:729-736.Hanbury,A,Müller,H,etal.(2015).Evaluation-as-a-Service:OverviewandOutlook,arXiv.http://arxiv.org/pdf/1512.07454v1Haynes,RB,McKibbon,KA,etal.(1990).OnlineaccesstoMEDLINEinclinicalsettings.AnnalsofInternalMedicine.112:78-84.Heilman,J(2013).Onlineencyclopediaprovidesfreehealthinfoforall.BulletinoftheWorldHealthOrganization.91:8-9.Hersh,W,Müller,H,etal.(2009).TheImageCLEFmedmedicalimageretrievaltasktestcollection.JournalofDigitalImaging.22:648-655.Hersh,WandVoorhees,E(2009).TRECgenomicsspecialissueoverview.InformationRetrieval.12:1-15.Hersh,WR(1994).Relevanceandretrievalevaluation:perspectivesfrommedicine.JournaloftheAmericanSocietyforInformationScience.45:201-206.Hersh,WR(2009).InformationRetrieval:AHealthandBiomedicalPerspective(3rdEdition).NewYork,NY,Springer.Hersh,WR,Bhupatiraju,RT,etal.(2006).Enhancingaccesstothebibliome:theTREC2004GenomicsTrack.JournalofBiomedicalDiscoveryandCollaboration.1:3.http://www.j-biomed-discovery.com/content/1/1/3Hersh,WR,Crabtree,MK,etal.(2002).FactorsassociatedwithsuccessforsearchingMEDLINEandapplyingevidencetoanswerclinicalquestions.JournaloftheAmericanMedicalInformaticsAssociation.9:283-293.Hersh,WR,Crabtree,MK,etal.(2000).Factorsassociatedwithsuccessfulansweringofclinicalquestionsusinganinformationretrievalsystem.BulletinoftheMedicalLibraryAssociation.88:323-331.Hersh,WRandHickam,DH(1998).Howwelldophysiciansuseelectronicinformationretrievalsystems?Aframeworkforinvestigationandreviewoftheliterature.JournaloftheAmericanMedicalAssociation.280:1347-1352.Hersh,WR,Hickam,DH,etal.(1994).AperformanceandfailureanalysisofSAPHIREwithaMEDLINEtestcollection.JournaloftheAmericanMedicalInformaticsAssociation.1:51-60.Hersh,WR,Müller,H,etal.(2006).Advancingbiomedicalimageretrieval:developmentandanalysisofatestcollection.JournaloftheAmericanMedicalInformaticsAssociation.13:488-496.Holan,AD(2016).2016LieoftheYear:Fakenews.St.Petersburg,FL,Politifact.http://www.politifact.com/truth-o-meter/article/2016/dec/13/2016-lie-year-fake-news/Huesch,MD(2013).Privacythreatswhenseekingonlinehealthinformation.JAMAInternalMedicine.173:1838-1839.

Page 3: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

Insel,TR,Volkow,ND,etal.(2003).Neurosciencenetworks:data-sharinginaninformationage.PLoSBiology.1:E17.Kalpathy-Cramer,J,SecodeHerrera,AG,etal.(2015).Evaluatingperformanceofbiomedicalimageretrievalsystems-anoverviewofthemedicalimageretrievaltaskatImageCLEF2004–2013.ComputerizedMedicalImagingandGraphics.39:55-61.Laine,C,Horton,R,etal.(2007).Clinicaltrialregistration:lookingbackandmovingahead.JournaloftheAmericanMedicalAssociation.298:93-94.Laurent,MRandVickers,TJ(2009).Seekinghealthinformationonline:doesWikipediamatter?JournaloftheAmericanMedicalInformaticsAssociation.16:471-479.Lee,JS,Lorincz,C,etal.(2011).ShouldHealthcareOrganizationsUseSocialMedia?FallsChurch,VA,ComputerSciencesCorp.http://assets1.csc.com/health_services/downloads/CSC_Should_Healthcare_Organizations_Use_Social_Media.pdfLibert,T(2015).PrivacyimplicationsofhealthinformationseekingontheWeb.CommunicationsoftheACM.58(3):68-77.Lohr,S(2012).TheFutureofHigh-TechHealthCare—andtheChallenge.Newyork,NY.NewYorkTimes.February13,2012.http://bits.blogs.nytimes.com/2012/02/13/the-future-of-high-tech-health-care-and-the-challenge/Magrabi,F,Coiera,EW,etal.(2005).Generalpractitioners'useofonlineevidenceduringconsultations.InternationalJournalofMedicalInformatics.74:1-12.Marcetich,J,Rappaport,M,etal.(2004).IndexingconsistencyinMEDLINE.MLA04Abstracts,Washington,DC.MedicalLibraryAssociation.10-11.Markoff,J(2011).ComputerWinson‘Jeopardy!’:Trivial,It’sNot.NewYork,NY.NewYorkTimes.February16,2011.http://www.nytimes.com/2011/02/17/science/17jeopardy-watson.htmlMcHenry,R(2004).TheFaith-BasedEncyclopedia.TechCentralStation,November15,2004.http://www.techcentralstation.com/111504A.htmlMello,MM,Francer,JK,etal.(2013).Preparingforresponsiblesharingofclinicaltrialdata.NewEnglandJournalofMedicine.369:1651-1658.Metzger,JandRhoads,J(2012).SummaryofKeyProvisionsinFinalRuleforStage2HITECHMeaningfulUse.FallsChurch,VA,ComputerSciencesCorp.http://skynetehr.com/PDFFiles/MeaningUse_Stage2.pdfNicholson,DT(2006).AnevaluationofthequalityofconsumerhealthinformationonWikipediaCapstone,OregonHealth&ScienceUniversity.Nielsen,JandLevy,J(1994).Measuringusability:preferencevs.performance.CommunicationsoftheACM.37:66-75.Perrin,A(2015).One-fifthofAmericansreportgoingonline‘almostconstantly’.Washington,DC,PewResearchCenter.http://www.pewresearch.org/fact-tank/2015/12/08/one-fifth-of-americans-report-going-online-almost-constantly/Pluye,PandGrad,RM(2004).Howinformationretrievaltechnologymayimpactonphysicianpractice:anorganizationalcasestudyinfamilymedicine.JournalofEvaluationinClinicalPractice.10:413-430.Pluye,P,Grad,RM,etal.(2005).Impactofclinicalinformation-retrievaltechnologyonphysicians:aliteraturereviewofquantitative,qualitativeandmixedmethodsstudies.InternationalJournalofMedicalInformatics.74:745-768.Purcell,K,Brenner,J,etal.(2012).SearchEngineUse2012.Washington,DC,PewInternet&AmericanLifeProject.http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012.aspxRodwin,MAandAbramson,JD(2012).Clinicaltrialdataasapublicgood.JournaloftheAmericanMedicalAssociation.308:871-872.

Page 4: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

Roegiest,AandCormack,GV(2016).Anarchitectureforprivacy-preservingandreplicablehigh-recallretrievalexperiments.Proceedingsofthe39thInternationalACMSIGIRconferenceonResearchandDevelopmentinInformationRetrieval,Pisa,Italy.1085-1088.Ross,JSandKrumholz,HM(2013).Usheringinaneweraofopensciencethroughdatasharing:thewallmustcomedown.JournaloftheAmericanMedicalAssociation.309:1355-1356.Royle,JA,Blythe,J,etal.(1995).Literaturesearchandretrievalintheworkplace.ComputersinNursing.13:25-31.Salton,G(1991).Developmentsinautomatictextretrieval.Science.253:974-980.Sánchez-Mendiola,MandMartínez-Franco,AI,Eds.(2014).InformáticaBiomédica,2aEdición.MexicoCity,MX,Elsevier.Shortliffe,EHandCimino,JJ,Eds.(2014).BiomedicalInformatics:ComputerApplicationsinHealthCareandBiomedicine(FourthEdition).London,England,Springer.Smith,M(2014).Targeted:HowTechnologyIsRevolutionizingAdvertisingandtheWayCompaniesReachConsumers.Washington,DC,AMACOM.Stanfill,MH,Williams,M,etal.(2010).Asystematicliteraturereviewofautomatedclinicalcodingandclassificationsystems.JournaloftheAmericanMedicalInformaticsAssociation.17:646-651.Strzalkowski,TandHarabagiu,S,Eds.(2006).AdvancesinOpen-DomainQuestionAnswering.Dordrecht,Netherlands,Springer.Taylor,H(2010)."Cyberchondriacs"ontheRise?Thosewhogoonlineforhealthcareinformationcontinuestoincrease.Rochester,NY,HarrisInteractive.http://www.harrisinteractive.com/vault/HI-Harris-Poll-Cyberchondriacs-2010-08-04.pdfTuason,O,Chen,L,etal.(2004).Biologicalnomenclatures:asourceoflexicalknowledgeandambiguity.PacificSymposiumonBiocomputing,Kona,Hawaii.WorldScientific.238-249.Voorhees,EandHersh,W(2012).OverviewoftheTREC2012MedicalRecordsTrack.TheTwenty-FirstTextREtrievalConferenceProceedings(TREC2012),Gaithersburg,MD.NationalInstituteofStandardsandTechnologyhttp://trec.nist.gov/pubs/trec21/papers/MED12OVERVIEW.pdfVoorhees,EM(2005).QuestionAnsweringinTREC.TREC-ExperimentandEvaluationinInformationRetrieval.E.VoorheesandD.Harman.Cambridge,MA,MITPress:233-257.Voorhees,EMandHarman,DK,Eds.(2005).TREC:ExperimentandEvaluationinInformationRetrieval.Cambridge,MA,MITPress.Voorhees,EMandTong,RM(2011).OverviewoftheTREC2011MedicalRecordsTrack.TheTwentiethTextREtrievalConferenceProceedings(TREC2011),Gaithersburg,MD.NationalInstituteofStandardsandTechnologyWanke,LAandHewison,NS(1988).ComparativeusefulnessofMEDLINEsearchesperformedbyadruginformationpharmacistandbymedicallibrarians.AmericanJournalofHospitalPharmacy.45:2507-2510.Westbrook,JI,Gosling,AS,etal.(2005).Theimpactofanonlineevidencesystemonconfidenceindecisionmakinginacontrolledsetting.MedicalDecisionMaking.25:178-185.Wu,S,Liu,S,etal.(2017).Intra-institutionalEHRcollectionsforpatient-levelinformationretrieval.JournaloftheAmericanSocietyforInformationScience&Technology:inpress.Yandell,MDandMajoros,WH(2002).Genomicsandnaturallanguageprocessing.NatureReviews-Genetics.3:601-610.Zarin,DAandTse,T(2013).Trustbutverify:trialregistrationanddeterminingfidelitytotheprotocol.AnnalsofInternalMedicine.159:65-67.Zarin,DA,Tse,T,etal.(2015).TheproposedruleforU.S.clinicaltrialregistrationandresultssubmission.NewEnglandJournalofMedicine.372:174-180.Zarin,DA,Tse,T,etal.(2011).TheClinicalTrials.govresultsdatabase--updateandkeyissues.NewEnglandJournalofMedicine.364:852-860.

Page 5: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

1

BiomedicalInformationRetrievalWilliamHersh,MDProfessorandChair

DepartmentofMedicalInformatics&ClinicalEpidemiologyOregonHealth&ScienceUniversity

Portland,OR,USAEmail:[email protected]:www.billhersh.info

Blog:http://informaticsprofessor.blogspot.comTwitter:@williamhersh

1

Topicstocover

• Content• Indexing• Evaluation

2

Page 6: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

2

Content

• Currentstatusandchallengesinbiomedicalinformationretrieval(IR)

• Classificationandexamplesofknowledge-basedinformation

3

ChallengesinbiomedicalIR

• Wehavegonefrominformationpaucitytoinformationoverload

• Manytopicswewanttosearchonhavemultiplewaystobeexpressed– e.g.,diseases,genes,symptoms,etc.

• Theconverseisaproblemtoo:Manywordsandtermsusedtoexpresstopicshavemultiplemeanings

• Balancingopenaccessvs.providingforcostofproductionandmaintenance

4

Page 7: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

3

IRisnow“mainstream”• Internet(andlikelysearchengine)

useisnowubiquitous– Notonlyindevelopedcountries(Perrin,

2015)butacrossworld–http://www.internetworldstats.com/stats.htm

• 71%ofInternetusers(59%ofUSadults)havesearchedforhealthinformation,with35%usingitforself-diagnosis(Fox,2013)

• “Searchengineoptimization”(SEO)isakeyfunctionusedbymanycompaniesandorganizations(Moz,2015)

– https://moz.com/beginners-guide-to-seo

– Somearelucky,e.g.,lastnameof“Hersh”

5

TheWebhaschangedthenatureofsearch

• Threemajoruses(Broder,2002)– Informational– seekinginformation(39-48%)– Navigational– lookingforaspecificpage,e.g.,ahomepage(20-

24%)– Transactional– performtransactions,e.g.,on-linepurchasing

(30-36%)• Weareintheeraof“adversarial”search– thereiscontent

wedonotwanttoretrieve(Castillo,2011;Smith,2014)– Someofthecontentwemightnotwanttoretrieveis“fake

news,”whichcametotheforein2016(Holan,2016)• Growingprivacyconcernsabouttrackingoursearching

(Huesch,2013;Libert,2015)

6

Page 8: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

4

IRalsoagrowingpartof“knowledgediscovery”fromscientificliterature

7

Allliterature

Possiblyrelevantliterature

Definitelyrelevantliterature

Structuredknowledge

Informationretrieval

Informationextraction,textmining

IRandonlineaccessfirmlyplantedinhealthandbiomedicine

• Biologyisnowdefinedasan“informationscience”(Insel,2003)

• Pharmaceuticalcompaniescompeteforinformatics/librarytalent(Davies,2006)

• Clinicianscannotkeepup– averageof75clinicaltrialsand11systematicreviewspublishedeachday(Bastian,2010)

• Searchforhealthinformationbyclinicians,researchers,andpatients/consumersisubiquitous(Purcell,2012;Google/ManhattanResearch,2012)– It’sevenpartof“meaningfuluse”– textsearchoverelectronichealthrecordnotes(Metzger,2012)

8

Page 9: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

5

Useisubiquitousamongphysicians(Google/ManhattanResearch,2012)

• Mosthavemultipledevices– 99%withadesktoporlaptop,84%withasmartphone,and54%withatablet

• Spendtwiceasmuchtimeusingonlineresourcesasprintresources• Evenphysiciansaged55+heavyusers– 80%ownasmartphone,84%usesearch

enginesdaily,and9hoursperweekisspentonlineforprofessionalpurposes• Searchengineuseadailyactivity– 84%,withaverageofsixsearchesdoneperday

and94%usingGoogle• Whenlookingforclinicalortreatmentinformation,aboutathirdclickfirston

sponsoredlistingsfromasearch• About93%saytheytakeactionbasedonsearching– everythingfrompursuing

moreinformationtosharingwithapatientorcolleaguetochangingtreatmentdecisions

• Onsmartphones,searchingispreferredovermobileapps – 48%ofusetimewithasearchengine,34%withmobileapps,and18%goingtospecificWebsitesinabrowserorwithabookmark

• Spendabout6hoursperweekwatchingonlinevideo,withabouthalfofthattimespentforprofessionalpurposes

9

Whatkindofhealthinformationdoconsumerssearchfor?(Fox,2011)

Healthtopic %searchingSpecificdiseaseormedicalproblem 66%Certainmedicaltreatmentorprocedure 56%Doctorsorotherhealthprofessionals 44%Hospitalsorothermedicalfacilities 36%Healthinsurance– privateorgovernment 33%Foodsafetyorrecalls 29%Environmentalhealthhazards 22%Pregnancyandchildbirth 19%Medicaltestresults 16%

10

Page 10: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

6

HowtofindmoreinformationaboutIRinhealthandbiomedicine

• HershWR,InformationRetrieval:AHealthandBiomedicalPerspective,ThirdEdition,2009– Website:www.irbook.info

• Chaptersinotherbooks,e.g.,Shortliffe (2014),Sanchez-Mendiola (2014)

• Plentyofotherbooks,journals,andothersources

11

WhyisIRpertinenttohealthandbiomedicine?

• Growthofknowledgehaslongsurpassedhumanmemorycapabilities

• Clinicianshavefrequentandunmetinformationneeds• Researchersmustfrequentlyupdatetheirknowledgeinnew

areasquickly• Primaryliteratureonagiventopiccanbescatteredandhard

tosynthesize• Non-primaryliteraturesourcesareoftenneither

comprehensivenorsystematic• Webisincreasinglyusedassourceofhealthandbiomedical

information

12

Page 11: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

7

Life-cycleofknowledge-basedinformation

13

Originalresearch

Writeupresults

Submitforpublication

Publish

Secondarypublications

Peerreview

Publicdatarepository

Relinquishcopyright

Revise

Reject

Accept

Classificationofknowledge-basedscientificinformation

• Primary– originalresearch– Publishedmainlyinjournalsbutalsoinconferenceproceedings,technicalreports,books,etc.

– Canincludere-analysis,e.g.,meta-analysisandsystematicreviews

• Secondary– reviews,condensations,and/orsynopsesofprimaryliterature– Textbooksandhandbooksarestaplesofclinicalpractitioners,researchers,andothers

– Guidelinesareimportantfornormalizingcareandmeasuringquality

14

Page 12: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

8

Classificationofknowledge-basedcontent

• Bibliographic– Bydefinitionrichinmetadata

• Full-text– Everythingon-line

• Annotated– Non-textorstructuredtextannotatedwithtext

• Aggregations– Bringingtogetheralloftheabove

• Thesecategoriesareadmittedlyfuzzy,andincreasingnumbersofresourceshavemorethanonetype

15

Bibliographiccontent• Bibliographicdatabases

– Theold(e.g.,MEDLINE)havebeenrevitalizedwithnewfeatures

– Newones(e.g.,NationalGuidelinesClearinghouse)haveemerged

• Webcatalogs– Sharemanycharacteristicsoftraditionalbibliographicdatabases

• Realsimplesyndication/Richsitesummary(RSS)– “Feeds” provideinformationaboutnewcontent

16

Page 13: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

9

Bibliographicdatabases• Containmetadataabout(mostly)journalarticlesandotherresourcestypicallyfoundinlibraries

• Producedby– U.S.government– mostproducedbyNationalLibraryofMedicine(NLM,www.nlm.nih.gov)

• e.g.,MEDLINE,genomicsinformation,etc.– Commercialpublishers,e.g.,

• EMBASE– partoflargerSciVal• CINAHL– CumulativeIndextoNursingandAlliedHealthLiterature

• ACMGuidetoComputingLiterature– computerscienceandrelatedareas

17

MEDLINE• Referencestobiomedicaljournalliterature

– OriginalmedicalIRapplication– systemforsearchingMEDLINElaunchedin1971withliteraturemaintainedinMEDLARSsystemdatingbackto1966

• NamederivesfromMEDLARSOn-Line– MEDLINE– Freetoworldsince1997viaPubMed– http://pubmed.gov

• Nowwithlinkstofulltextofarticlesandotherresources• Statistics

– http://www.nlm.nih.gov/bsd/bsd_key.html– Over23Mreferencestopeer-reviewedliterature– Over5600journals,mostlyEnglishlanguage– Nearly900,000newreferencesaddedyearly

18

Page 14: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

10

NationalGuidelinesClearinghouse

• ProducedbyAgencyforHealthcareResearchandQuality(AHRQ)– www.guideline.gov

• Containsdetailedinformationaboutguidelines– Includingdegreetheyareevidence-based– Interfaceallowscomparisonofelementsindatabaseformultipleguidelines

• HaslinkstothosethatarefreeonWebandlinkstoproducerswhenproprietary

19

Webcatalogs

• Generallyaimtoprovidequality-filteredWebsitesaimedatspecificaudiences– Distinctionbetweencatalogsandsitesblurry

• Someareaimedtowardsclinicians– HONSelect– http://www.hon.ch/HONselect/– TranslatingResearchintoPractice–www.tripdatabase.com

• Othersareaimedtowardspatients/consumers– Healthfinder – www.healthfinder.gov

20

Page 15: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

11

RSS

• RSS“feeds” provideshortsummaries,typicallyofnews,journalarticles,orotherrecentpostingsonWebsites

• UsersreceiveRSSfeedsbyanRSSaggregatorthatcantypicallybeconfiguredforthesite(s)desiredandtofilterbasedoncontent– Workasstandalone,inWebbrowsers,inemailclients,etc.

• Twoversions(1.0,2.0)butbasicallyprovide– Title– nameofitem– Link– URLoffullpage– Description– briefdescriptionofpage

21

Full-textcontent

• Containscompletetextaswellastables,figures,images,etc.

• Ifthereiscorrespondingprintversion,bothareusuallyidentical

• Includes– Periodicals– Books– Websites– mayincludeeitherofabove

22

Page 16: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

12

Full-textprimaryliterature• Almostallbiomedicaljournalsavailableelectronically

– ManypublishedbyHighwire Press(www.highwire.org),whichaddsvaluetocontentoforiginalpublisher,includingBritishMedicalJournal,JournaloftheAmericanMedicalAssociation,NewEnglandJournalofMedicine,etc.

– Alsopublishedbyleadingcommercialscientificpublishers,e.g.,Elsevier,Kluwer,Springer,etc.

– Growingnumberavailableviaopen-accessmodel,e.g.,BiomedCentral(BMC),PublicLibraryofScience(PLoS)

– Anothersourceoffull-textpapersisPubMedCentral(PMC;http://pubmedcentral.gov)

23

Books• Textbooks

– Mostwell-knownclinicaltextbooksarenowavailableelectronically

• e.g.,Harrison’sPrinciplesofInternalMedicine– Mostarebundledintolargecollectionsbypublishers

• e.g.,AccessMedicine(McGraw-Hill),Elsevier,Kluwer– NLMhasdevelopedbookssiteaspartofEntrez

• http://www.ncbi.nlm.nih.gov/books• Compendiaofdrugs,diseases,evidence,etc.• Handbooks– verypopularwithclinicians• Increasinglypublishedonmobiledevices

24

Page 17: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

13

Valueaddedforelectronicbooks• Multimedia,e.g.,skinlesions,shufflinggaitofParkinson’sDisease,etc.

• Bundlingofmultiplebooks

• Canbeupdatedinbetween“editions”

• Linkagetootherinformation,e.g.,toreferences,self-assessments,updates,otherresources,etc.

25

Websites

• DefinedmorenarrowlyheretorefertocoherentcollectionsofinformationonWeb

• UsuallytakeadvantageofWebfeatures,suchaslinking,multimedia

• Increasinglyintegratedwithotherresourcesandavailableondifferentplatforms(e.g.,integratedintoelectronichealthrecords[EHRs],onsmartphones,etc.)

26

Page 18: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

14

Somenotablefull-textcontentonWebsites

• Governmentagencies– NationalCancerInstitute

• www.cancer.gov– CentersforDiseaseControl– travelandinfectioninformation

• http://www.cdc.gov/DiseasesConditions• http://www.cdc.gov/travel/

– OtherNIHinstitutes,e.g.,NationalHeart,Lung,andBloodInstitute(NHLBI)

• www.nhlbi.nih.gov

27

Full-textWebsites(cont.)• Physician-orientedmedicalnewsandoverviews,e.g.,

– Medscape– www.medscape.com– Manyprofessionalsocietiesprovidetomembers,e.g.,http://www.acponline.org/clinical_information/

• Patient/consumer-oriented,e.g.,– NetWellness – www.netwellness.com– WebMD– www.webmd.com

• Manymobileappsprovidehealthinformation,e.g.,– iTriage – www.itriagehealth.com– Epocrates – www.epocrates.com

28

Page 19: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

15

OtherinterestingtypesofWebcontent

• Wikipedia– www.wikipedia.org– Encyclopediawithfreeaccessanddistributedauthorship– Someconcernsaboutmanipulation(McHenry,2004)but

• ComparabletoEncyclopediaBritannica?(Giles,2005– rebuttal:Anonymous,2006)

• Healthinformationqualityisreasonablygood(Nicholson,2006)• ContentretrievedprominentlyinmostWebsearches(Laurent,2009)• Makingattempttoimprovequalityofmedicalcontent(Heilman,2013)

• Bodyofknowledge– SoftwareEngineeringBodyofKnowledge(SWEBOK,

www.swebok.org)organizesknowledgeoffield• Socialmedia/Web2.0andbeyond(Lee,2011)

29

Annotated

• Non-textorstructuredtextannotatedwithtext

• Includes– Imagecollections– Citationdatabases– Evidence-basedmedicinedatabases– Clinicaldecisionsupport– Genomicsdatabases– Otherdatabases

30

Page 20: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

16

Imagecollections• Mostprominentinthe“visual” medicalspecialties,suchas

radiology,pathology,anddermatology• Well-knowncollectionsinclude

– VisibleHuman–http://www.nlm.nih.gov/research/visible/visible_human.html

– Lieberman’seRadiology – http://eradiology.bidmc.harvard.edu– WebPath – http://library.med.utah.edu/WebPath/webpath.html– Morepathology– PEIR,www.peir.net– DermIS – www.dermis.net– Moredermatology,alsoadecision-supportsystem–

www.visualdx.com• Manyhaveassociatedtext,whichassistswithindexingand

retrieval

31

Citationdatabases• ScienceCitationIndexandSocialScienceCitationIndex– Databaseofjournalarticlesthathavebeencitedbyotherjournalarticles

– NowpartofapackagecalledWebofScience,whichitselfispartofalargerproduct,WebofKnowledge(Clarivate)

• http://clarivate.com/scientific-and-academic-research/research-discovery/web-of-science/

• SCOPUS– http://www.elsevier.com/online-tools/scopus

• GoogleScholar– http://scholar.google.com

32

Page 21: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

17

Evidence-basedmedicinedatabases

• CochraneDatabaseofSystematicReviews–http://www.cochrane.org– Collectionofsystematicreviews,keptupdated

• Evidence“formularies”– ClinicalEvidence(BMJ)–http://clinicalevidence.bmj.com/x/index.html

– JAMAevidence – http://jamaevidence.com• PubMedHealth–https://www.ncbi.nlm.nih.gov/pubmedhealth/– Systematicreviewsandsummariesofsystematicreviews

• Manyresourcespartofaggregations

33

Clinicaldecisionsupport(CDS)• ContentusedinCDSsystems,usuallypartofEHRs

– Ordersets(usually“evidence-based”)– CDSrules– Health/diseasemanagementtemplates

• Growingandevolvingcommercialmarketforsuchtools,especiallyasEHRadoptionincreases;leadersinclude– Zynx – www.zynxhealth.com– ThomsonReutersCortellis –http://cortellis.thomsonreuters.com

– EHRvendorsthemselvesandpartners

34

Page 22: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

18

Genomicsdatabases• NationalCenterforBiotechnologyInformation(NCBI,www.ncbi.nlm.nih.gov;NCBI,2017)collectionlinks– Literaturereferences– MEDLINE– Textbookofgeneticdiseases– On-LineMendelianInheritanceinMan

– Sequencedatabases– Genbank– Structuredatabases– MolecularModelingDatabase– Genomes– Catalogofgenes– Maps– Locationsofgenesonchromosomes

35

Otherdatabases

• ClinicalTrials.gov– www.clinicaltrials.gov– OriginallydatabaseofclinicaltrialsfundedbyNIH– Nowusedasregisterforclinicaltrials,withresultsreportingforsome(DeAngelis,2005;Laine,2007;Zarin,2013;Zarin,2015)

• NIHRePORTER– http://projectreporter.nih.gov/reporter.cfm– DatabaseofallresearchgrantsfundedbyNIH– ReplacedtheCRISPdatabase

36

Page 23: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

19

Datapublishing• Internetmakesittechnologicallyfeasible• Manyfieldshavelongtraditionofrequiringdepositingofdatainpublic

repositoryasaconditiontopublish,e.g.,genomics,althoughavailabilityincomplete(Alsheikh-Ali,2011)

• Growingadvocacyforclinicaltrialsdata– A“publicgood”(Rodwin,2012)forneweraof“openscience”(Ross,2013)– Callsfordoingsobyjournaleditors(Taichman,2016)andothers(Ross,2013;

Mello,2013)– Pushbackfromtrialistswhowanttime-limitedprotectionofthosewho

generatedataforrewardsoftheirworkandfromthosewhoaimtodiscreditorundermineoriginalresearch(Anonymous,2016)

• biomedicalandhealthCAre DataDiscoveryIndexEcosystem(bioCADDIE)– Databaseofmetadataaboutavailablebiomedicaldatasets– https://datamed.org/

37

Aggregations– integratingmanyresources

• Clinical– growingtendencyofpublisherstoaggregateresourcesintocomprehensiveproducts– MerckMedicus – www.merckmedicus.com

• CollectionofmanyresourcesavailabletoanylicensedUSphysician

– UptoDate– www.uptodate.com• Verypopularamongclinicians

– EssentialEvidencePlus(includesInfoPOEMS,“Patient-orientedevidencethatmatters”)–www.essentialevidenceplus.com

– Dynamed – www.dynamed.com

38

Page 24: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

20

Otheraggregations

• Biomedicalresearch:Modelorganismdatabases,e.g.,MouseGenomeInformatics– www.informatics.jax.org– Combinesgenomicsandrelateddata,bibliographicdatabase,genereferences,etc.

• Consumer:MEDLINEplus– http://medlineplus.gov– IntegratesavarietyoflicensedresourcesandpublicWebsites

39

Indexing

• Assignmentofmetadatatocontenttofacilitateretrieval

• Twomajortypes– Humanindexingwithcontrolledvocabulary– Automatedindexingofallwords

• Alsoaddress– Indexingother“objects”– UMLSMetathesaurus– Webindexing

40

Page 25: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

21

Humanindexing

• Usuallyperformedbyprofessionalindexerwithsomebackgroundinbiomedicine

• Followsprotocoltoscanresourceandselecttermsfromacontrolledvocabulary

• Mostvocabulariesarehierarchicalandhavespecificdefinitionsforwhentermistobeassigned

41

MedicalSubjectHeadings(MeSH)vocabulary(Colletti,2001)

• Over26,000terms,withmanysynonymsforthoseterms

• Over230,000SupplementaryConceptRecords,formerlymostlychemicalsanddrugs,nowrarediseasesandgenes

• Hierarchical,basedon16trees,e.g.,Anatomy,Diseases,ChemicalsandDrugs

• Contains83subheadings,whichcanbeusedtomakeaheadingmorespecific,suchasDiagnosisorTherapy

• MeSH browserallowsexploration– http://www.nlm.nih.gov/mesh/MBrowser.html

42

Page 26: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

22

C Diseases

C20Immunologic

Diseases

C14Cardiovascular

Diseases

C1 Bacterial andFungal Diseases

C14.907.055Aneurysm

C14.907VascularDiseases

C14.280 HeartDiseases

C14.907.489Hypertension

C14.907.940Vasculitis

C14.907.489.631Renal

Hypertension

C14.907.489.430Portal

Hypertension

C14.907.489.330Malignant

Hypertension

C14.240CardiovascularAbnormalities

AsliceofMeSH

43

MEDLINEindexing• Indexingdonebyprofessionalswhofollowprotocolfirst

devisedbyBachrach (1978)– Readtitle,introduction,andconclusionandthenscanmethods,

results,figures,tables,and,lastly,abstract– Ignore“keywords”ofpublisher– Assign2-4headings(withorwithoutsubheadings)ascentral

concepts(ormajorheadings)andanother5-10asminorheadings

– Usemostspecificheadingsinhierarchyassigned• ImportantadditionaltagisPublicationTypes

– e.g.,RandomizedControlledTrial,Meta-Analysis,PracticeGuideline,Review

• Manymoderntoolshavebeendevelopedtoassistindexing,suchastermsuggestionandlook-up

44

Page 27: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

23

Otherbibliographicindexing

• OtherNLMdatabasesuseMeSH• Somenon-NLMresourcesuseMeSH

– MeSHfreelyavailablefromNLMathttp://www.nlm.nih.gov/mesh/filelist.html

• Othernon-NLMdatabaseshavetheirownsubjectheadings,e.g.,– CINAHLsubjectheadings– EMTREE

45

Othermetadata

• Indexingcoversmorethancontent• Otherattributesofdocumentstoindexcaninclude– Author(s)– Source:journalname,issue,pages– Publicationorresourcetype– Relationshiptootherinformation

• e.g.,geneidentifier,grantnumber,etc.

46

Page 28: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

24

Automatedindexing

• Indexingofallwordsthatoccurincontentitems– Inbibliographicdatabases,willusuallyincludetitle,abstract,andoftenotherfields,e.g.,authororsubjectheading

– Infull-textdocuments,willusuallyincludealltext,includingtitle

• Oftenuseastopwordlisttoremovecommonwords(e.g.,the,and,which)

• Somesystems“stem”wordstorootform(e.g.,coughs orcoughing tocough)

47

Weightedindexing(Salton,1991)

• Usuallyusedwithautomatedindexing• Givesweighttowordsthatarefrequentbutdiscriminating

• MostcommonapproachisforweighttoequalproductTF*IDF– Inversedocumentfrequencyofwordi

• IDFi =log(#documents/#documentswithword)+1

– Termfrequencyofwordiindocumentj• TFij =frequencyofwordindocument

48

Page 29: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

25

Weightedindexingexamples

• FromadatabaseonAIDS– ThewordAIDSwilllikelyoccurinalmosteverydocument,whileretinopathy willbemuchmore“discriminating”

• Inageneralmedicaldatabase– AIDSwilloccurmuchlessfrequently,soisbetterindexingterm

49

“Visual” indexing– e.g.,Wordle,www.wordle.net

50

Scientificpublicationsofyourinstructor(fromSciVal app)

Page 30: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

26

Citationindexing• Othercontentitemsthat“cite”thisone,e.g.,references,links,etc.

• Indexingisatcontentitemlevel• Goalistodesignaterelatedorimportantcontentitems• Citationdatabaseslistallotherarticlesthatciteaspecificarticleinjournals– e.g.,ScienceCitationIndex,SCOPUS,andGoogleScholar

• NovelfeatureofGooglesearchengine(Brin,1998)wasgivinghigherweighttoWebpagesthathavemorelinkstothem

51

Limitationsofhumanindexing• Inconsistency

– WhenMEDLINErecordsindexedinduplicate,consistencyvariesfrom63%forcentralconceptheadingsto36%forheading-subheadingcombination(Funk,1983)

– Resultsverifiedevenwithmodernindexingtoolsandmethods(Marcetich,2004)

• Inadequateindexingvocabulary– Upto25%ofallconceptsnotrepresentedinMeSH(Hersh,1994)

– Ambiguitiesandothernamingproblemswithgenes,proteins,etc.(Yandell,2002;Tuason,2004)

52

Page 31: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

27

Limitationsofwordindexing

• Synonymy– e.g.,cancer/carcinoma• Polysemy– e.g.,lead• Context– e.g.,highbloodpressure• Focus– e.g.,centralvs.incidentalconcepts• Granularity– e.g.,antibioticsvs.specificones

53

Research

• Evaluation– Howvaluablearesystemstousers?– Howwelldosystemsandusersperform?

• Futuredirections– ApplyingIRtechniquestoelectronichealthrecords

– Beyondretrieval– question-answering

54

Page 32: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

28

Evaluation• Questionsoftenasked

– Issystemused?– Areuserssatisfied?– Dotheyfindrelevantinformation?– Dotheycompletetheirdesiredtask?

• Moststudiedgroupisphysicians,withsystematicreviewsofresults(Hersh,1998,Pluye,2005)

• MostIRevaluationresearchhasfocusedonretrievalofrelevantdocuments,whichmaynotcapturefullspectrumofusage– Oftenconsistsofchallengeevaluationsthatdevelop“test

collections” – bestknownis(non-medical)TextRetrievalConference(TREC,http://trec.nist.gov)(Voorhees,2005)

55

Issystemused?

• MoststudiesdonepriortoubiquitousInternet,electronichealthrecords,mobiledevices,etc.

• Studiesinvariousclinicalsettings(Hersh,2009;Magrabi,2005)showedaverageusevariedfrom0.3to8.7accessesperperson-month

• Whatevertheactualnumber,thispaledincomparisontoknownphysicianinformationneeds(Gorman,1995)oftwoquestionspereverythreepatients

56

Page 33: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

29

Areuserssatisfied?

• Moststudiesreportgoodusersatisfaction,butsomeinterestingstudiestonote– Nielsen(1994)meta-analysisfoundassociation(thoughimperfect)betweenusersatisfactionandabilitytousecomputersystems

– MostInternetusersbelievetheymostlyfindinformationtheyareseeking(Taylor,2010;Fox,2011)

57

Dotheyfindrelevantinformation?

• Mostcommonapproachtoevaluation• Usuallymeasuredbyrelevance-basedmeasuresofrecallandprecision– Recall(R)

– Precision(P)

• Andvariousaggregations,e.g.,F,MAP,NDCG,etc.58

Page 34: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

30

Commentsaboutrecallandprecision

• Theretendstobeatrade-offbetweenthetwo• “Relevance” canbeanambiguousnotion(Hersh,1994)

• Itisunclearwhethertheycorrelatewithauser’ssuccessinusinganIRsystem

• Theproliferationofstandardtestcollectionsleadstoagreatdealofresearchthatexcludesrealusers

59

Howwelldoclinicianssearch?EarlyresultsfromHaynes(1990)

Searcher Type Recall PrecisionNovice clinicians 27% 38%Expertclinicians 48% 48%Librarians 49% 57%

60

Otherfindings• Littleoverlapamongretrievalsets

• Searcherstendedtofindsimilarquantitiesofdisparaterelevantdocuments

• Novicesearcherssatisfiedwithresults• Adequateinformationorignorantbliss?

Page 35: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

31

Extendingevaluationbeyondphysiciansanddocuments

• Otherclinicians– Nurses– Rolye,1995– Pharmacists– Wanke,1988– Nursepractitioners– Hersh,2000;Hersh,2002

• Biomedicalresearchers– VerylittlestudyoftheiruseofIRsystems– InvestigatedbyTRECGenomicsTrack(Hersh,2006;Hersh,2009)

– http://ir.ohsu.edu/genomics/• Imageretrieval– ImageCLEFmed (Hersh,2006;Hersh,

2009;Kalpathy-Cramer,2015)– Retrievalperformancerelatedtoquerytype,measureselection– http://ir.ohsu.edu/image/

61

Recallandprecisionstudiesyieldusefulresults,but

• Aresearchersabletosolvetheirinformationproblemsbyusingsystem?– Someresultsresearchhaveused“task-orientedapproach”tomeasurequestion-answering

– Hersh(2002)– useofMEDLINEtoanswerclinicalquestions

• Medicalstudentsanswered34%ofquestionsbeforesystem,51%afterwards

• Nursepractitionerstudentsanswered34%ofquestionsbeforesystembutdidnotchangewithsystem

• Timetoansweraquestionwas~30minutes• Noassociationofrecallorprecisionwithcorrectanswering

62

Page 36: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

32

Anothertask-orientedstudy

• Westbrook(2005)– useofonlineevidencesystem– Physiciansanswered37%ofquestionsbeforesystem,50%afterwards

– Nursespecialistsanswered18%ofquestionsbeforesystem,50%afterwards

– Thosewhohadcorrectanswershadhigherconfidenceintheiranswers,butthosenotknowinganswerinitiallyhadnodifferenceinconfidencewhetheranswerrightorwrong

63

HowdoIRsystemsimpactphysicianpractice?(Pluye,2004)

• Qualitativestudyfoundfourthemesmentionedbyphysicians– Recall– offorgottenknowledge– Learning– newknowledge– Confirmation– ofexistingknowledge– Frustration– thatsystemusenotsuccessful

• Researchersalsonotedtwoadditionalthemes– Reassurance– thatsystemisavailable– Practiceimprovement– ofpatient-physicianrelationship

64

Page 37: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

33

ChallengesforIRevaluationmovingforward

• Mustunderstandtasksofuserandfocusevaluationaccordingly

• Ultimatemeasure,likeanyotherinformaticsapplication,mightbehealthoutcome– ThismaybedifficultwithIRsystemssinceusagemaynotdirectlyimpactoutcomesofpatientcareorresearchactivity

65

Researchdirections– applyingIRtomedicalrecords

• Mostmedicalrecordsstillinnarrativedocuments,wherenaturallanguageprocessing(NLP)techniquesareimprovingbutstillimperfect(Stanfill,2010)

• Forsometasks,canwetakeanIRapproach?– TRECMedicalRecordsTrackusedde-identifiedcorpusofmedicalrecordsininitialtaskofidentifyingpatientsascandidatesforclinicalresearchstudies(Voorhees,2011;Voorhees,2012)

66

Page 38: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

34

TRECMedicalRecordsTracktestcollection

67(Courtesy,EllenVoorhees,NIST)

3EKrCWvnwcbU

DISCHARGE SUMMARY...PRINCIPAL DIAGNOSES:1. Urinary tract infection.2. Gastroenteritis.3. Dehydration.4. Hyperglycemia.5. Diabetes mellitus.6. Osteoarthritis.7. History of anemia.8. History of tobacco use.

HOSPITAL COURSE: The patient is a **AGE[in 40s]-year-old insulin-dependent diabetic who presented with nausea,...

VISIT LISTRECORD-VISIT MAP

Report Extract

20071026ER-9qWiuGEk8Xkz-488-541231171

20073482DS-56d8329-100-34234561

20071026RAD-9qWiuGEk8Xkz-488-1222308213

20073482DS-56d8329-100-34234561

20071027HP-9qWiuGEk8Xkz-488-1348146618

20073482DS-56d8329-100-34234561

2007100542DS-56d8329-100-34234561

20073482HP-56d8329-100-342348376

200782RAD-56d83asd29-100-34238923847

20071028HP-9qWiuGEk8Xkz-488-1617583866

2007348932DS-56dnp29-100-34289345023804

20073482DS-56d83fsdf29-344-34234561

20071030DS-9qWiuGEk8Xkz-488-856269896

200734462RAD-56d8329-800-87342345323

17,198 visits 101,712 reports (93,552 mapped to visits)

TRECMedicalRecordsTrackresults• Highlyvariableacrossdifferent

topics– Easiest– consistentlybest

results• 105:Patientswithdementia

– Hardest– consistentlyworstresults

• 108:Patientstreatedforvascularclaudicationsurgically

– Largedifferencesbetweenbestandworstresults

• 125:Patientsco-infectedwithHepatitisCandHIV

• Overallresultsshowsubstantialroomforimprovement– Bestresultsinvolvemanual

modificationofqueries

68

(Voorhees,2011;Voorhees,2012)

Page 39: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

35

Subsequentworkinmedicalrecordssearch

• Publictestcollectionsofmedicalrecordsstymiedbyprivacyconcerns

• FundedforprojectusingparallelcorporawithcommontopicsatOHSUandMayoClinic(Wu,2017)

• ExploringoptionsforEvaluationasaService(EaaS)toallowotherstousedatawithoutseeingit(Hanbury,2015)– SimilarsituationtoTRECTotalRecallTracksearchingoveremailandcorporaterepositories(Roegeist,2016)

69

MorerecentTRECtracks

• ClinicalDecisionSupport,2014-2016(Roberts,2016)– Givenpatientcase,findrelevantfull-textarticles(fromPMCsnapshot)aboutdiagnosis,tests,ortreatments

• PrecisionMedicine(2017)

70

Page 40: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

36

Researchdirections– question-answering

• Usersmayretrievedocuments,butusuallywantanswerstoquestions

• SubareaofIRresearchhasfocusedonquestion-answeringsystems(Strzalkowski,2006)

• Highest-profilesystemisIBMWatson– DevelopedoutofTRECQuestion-AnsweringTrack(Voorhees,2005;Ferrucci,2010)

– Additional(exhaustive)detailsinspecialissueofIBMJournalofResearchandDevelopment (Ferrucci,2012)

– BeathumansatJeopardy!(Markoff,2011)– Nowbeingappliedtohealthcare(Lohr,2012);has“graduated”medicalschool(Cerrato,2012)

71

HowdoesWatsonwork(Ferrucci,2010)?

• BuiltaroundasystemcalledDeepQA,whichusesmassivelyparallelcomputingtoacquireknowledgefromresourcesofagivendomain

• Learningprocessbuildsaroundsamplequestionsfromthedomain– Akeystepistoidentifylexicalanswertypes(LATs)inthedomain– Amonggeneralquestions,somecommonLATsincludehe,country,city,man,film,state,she,author,group,here,company,etc.

– NLPthenappliedtotextandknowledgerepresentationandreasoning(KRR)appliedtostructuredknowledge

– Machinelearningthenappliedtoquestionsandtheiranswers

72

Page 41: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

37

Watsonarchitecture(Ferrucci,2010)

73

ApplyingWatsontomedicine(Ferrucci,2012)

• Trainedusingseveralresourcesfrominternalmedicine:ACPMedicine,PIER,MerckManual,andMKSAP

• Conceptadaptationprocessrequired– Namedentitydetection– e.g.,disambiguationoftermsandtheir

senses– Measurerecognitionandinterpretation– e.g.,ageorbloodtestvalue– Recognitionofunaryrelations– e.g.,elevated<testresult>

• Trainedwith5000questionsfromDoctor'sDilemma,acompetitionlikeJeopardy!,inwhichmedicaltraineesparticipateandisrunbytheACPeachyear– Samplequestionis,Familial adenomatous polyposis is caused by mutations of this gene,withtheanswerbeing,APC Gene

• Googling thequestiongivesthecorrectansweratthetopofitsrankingtothisandtwoothersamplequestionslisted

74

Page 42: Biomedical Information Retrieval · questions using an information retrieval system. Bulletin of the Medical Library Association. 88: 323-331. Hersh, WR and Hickam, DH (1998). How

38

EvaluationofWatsononinternalmedicinequestions(Ferrucci,2012)

• Evaluatedonanadditional188unseenquestions

• Primaryoutcomemeasurewasrecallat10answers– HowwouldWatsoncompare

againstothersystems,suchasGoogleorPubmed,orusingothermeasures,suchasMRR?

• FutureusecaseforWatsonisapplyingsystemtodatainEHR,ultimatelyaimingtoserveasaclinicaldecisionsupportsystem(Cerrato,2012)

• Notmuchpeerreviewedliteraturesincethen…

75