John S. Brownstein, PhD
Harvard Medical School Children’s Hospital Informatics Program
Harvard‐MIT Division of Health Sciences and Technology
Global infectious disease surveillance
through automated multi‐lingual
georeferencing
of Internet media reports
Healthcare Surveillance
AEGIS INFLUENZA
AEGIS
Unstructured Web Surveillance
Abundant cheap/free resource
Detailed local information
Near real‐time reporting
Less susceptible to political pressure
Structured Clinical Surveillance
Lack of infrastructureLow level trainingGaps in coveragePoor information flow
Early reporting of SARS
Nov 2002 Mar 2003
Progression of outbreak
Electronic Surveillance
Cases of atypical pneumonia FoshanNov 16th
Infected Chinese DoctorHong Kong hotelFeb 21st
305 Cases of acute respGuangdong ProvinceFeb 11th
Pharma report Guangdong ProvinceNovember 27
Media reportsGuangdong ProvinceFeb 10
Astute physician on ProMEDFeb 10
Initial WHO ReportFeb 25
Official WHO ReportMarch 10
Source of outbreak news verified by WHO
Adapted from Heymann 2001
Limitations of internet‐based surveillance
Abundance of disparate electronic resources but none comprehensive
Information is unstructured ‐‐
free text
Difficulty in georeferencing
of information sources
No synthesized view of the current state of global health
Brownstein et al. Institute of Medicine. 2007.
www.healthmap.org
Healthmap
Objectives
Enhance surveillance of infectious diseases through integrationMulti‐stream real‐time web‐based surveillance system Alert aggregator ‐ news wire, web sites, RRS feeds, mailing listsAutomated multi‐lingual searching, categorization, filtering, georeferencing
Achieve unified and comprehensive view of global health Space and time mappingEasy information accessLimit information overload ‐‐ filter and duplicate removal
Free and open multi‐lingual mapping resource Open source technologies combined with Open API systemsLinux, Apache, MySQL,, PHP, Google Maps
Brownstein et al. Institute of Medicine. 2007.
Public Health Resource
Tool for general population
HealthMap Article Processing
ACQUIRING
>20,000 sites Every hour; 24/7
FILTERING
>3 million keywords 94% accuracy
CLUSTERING
Text Matching Similarity Score
CATEGORIZING
1500 disease patterns 5000 location patterns
UKUK unitedunited
kingdomkingdom statesstates arabarab
emiratesemirates
12
12 13
14As we process the input token by token, we traverse the tree accordingly.
Each node is a hashtable. Each key maps an input token either to an ID or another node.
UK 12united kingdom 12united states 13united arab emirates 14
Georeferencing
Early Stats
> 200 alerts per day
60,000 alerts so far
Alerts in 201 countries
169 pathogens
4 four languages English (60%)Spanish (20%)French (11%)Russsian (9%)
Geographic Representation201 Countries with alarms
1‐USA: 4351
2‐UK: 1018
3‐Canada: 880
4‐China:737
Multi‐lingual Surveillance
Coverage Comparison: Argentina
English News
Bovine Anthrax
Citrus Canker
Coverage Comparison: Argentina
Spanish News
Trichinosis
Bronchiolitis
Rotavirus
Influenza
Georeferencing
errors can occur
No known pattern given in input
Location given, wrong matchLondon, Georgia
Non‐location pattern matchedAntarctica the horse
Correct location, but too generalRussia vs. Stavropol Krai
47%
29%
24%
UpdatesInsertsDeletes
Georeferencing
accuracy
1774 alerts processed
134 location edited (7.6%)
Dictionary Approach Expansion
The dictionary approach provides us with a labeled corpus:
Health authorities in New Caledonia are closely monitoring an upsurge of dengue fever cases
Dictionary Approach Expansion
The dictionary approach provides us with a labeled corpus:
Health authorities in New Caledonia
are closely monitoring an upsurge of dengue fever
cases
Dictionary Approach Expansion
The dictionary approach provides us with a labeled corpus:
Health authorities in New Caledonia
are closely monitoring an upsurge of dengue fever
casesNNP NNS IN NNP VBP RB
VBG DT NN IN NN NNS
Expansion: Learn the syntactic/lexical context in which locations & diseases occur
Dictionary Approach Expansion
The dictionary approach provides us with a labeled corpus:
Health authorities in New Caledonia
are closely monitoring an upsurge of dengue fever
casesNNP NNS IN NNP VBP RB
VBG DT NN IN NN NNS
Expansion: Learn the syntactic/lexical context in which locations & diseases occur
Predict new locations & diseases:
Health authorities in California
are closely monitoring an upsurge of salmonella
cases
Collaborative Georeferencing
Networks
ProMED
of
the
International
Society
for
Infectious
Diseases
(specialty
moderators; full 40,000 members)
Emerging
Infections
Network
(EIN)
of
the
Infectious
Disease
Society
of
America (982 ID consultants)
US Naval Medical Research Center Detachment
of DOD‐GEIS in Peru (Spanish
and Portuguese moderation)
Conclusions
Internet‐based disease mapping offers a promising multi‐use tool
Value in visualization of distributed electronic resources
Georeferencing
still presents formidable challenges Higher resolution mapping
Limiting misclassifications
Multi‐lingual location identification
AcknowledgmentsChildren’s Hospital Informatics Program
@ Harvard‐MIT HST
Clark Freifeld
Mikaela Keller, PhD
Ken Mandl, MD MPH
Ben Reis, PhD
Isaac Kohane, MD PhD
Larry Madoff (ProMED)
David Blazes (Peru NMRCD)
Aranka Anema (UBC)
Funding
Google Foundation
National Library of Medicine (NLM)
Centers for Disease Control and Prevention
Canadian Institutes of Health Research (CIHR)