a web based tool for the detection and analysis of avianinfluenza outbreaks from internet news...

25
A Web Based Tool For the Detection and Analysis of Avian Influenza Outbreaks From Internet News Sources Ian Turton and Andrew Murdoch GeoVISTA Center Penn State University

Upload: ian-turton

Post on 22-Nov-2014

1.570 views

Category:

Education


0 download

DESCRIPTION

Paper presented at AutoCarto 2008 - Shepherdstown WV

TRANSCRIPT

Page 1: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

A Web Based Tool For the Detection and Analysis of

AvianInfluenza Outbreaks From

Internet News SourcesIan Turton and Andrew

MurdochGeoVISTA Center

Penn State University

Page 2: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Flight?

Page 3: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Summary

• Who we are?• Why we did it?• What is Avian Flu?• What we did?• How we did it?• Did it work?• What will we do next?

Page 4: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Who we are?

• Ian – Senior Research Associate in GeoVISTA

Center– E-Education Fellow in Dutton E-

Education Institute.

• Andrew– MGIS Student (graduated in Summer

2008)– GIS Developer at ArcBridge Consulting

and Training

Page 5: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What we did?

• Andrew needed a project for Ian’s course on web mapping, and later for his capstone project (like a dissertation).

• Ian had an interest in extracting geographic information from unstructured text.

• Picked the spread of Avian Influenza and how to map it automatically from news reports.

Page 6: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What is Avian Flu?

• Avian flu or Bird flu is a virus

• Most scary strain is H5N1 but there are many others.

• ~60% death rate in humans.

• Currently no (or very limited) human to human transmission.

Picture by Quiplash! CCbyA

Page 7: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What we did?

• Designed and built a system to automatically read internet news articles and map them for us so we could gain a better understanding of how avian flu is spreading on a day to day basis.

• Set it running to see how it did• Tweaked it a bit as we saw how it

worked

Page 8: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

How we did it?

• Data sources• Data processing tools• GeoCoding tools• Web Mapping tools

– Server– Client

Page 9: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Data Sources

• Official Avian Flu sites– WHO – PROMED

• Internet News sites– Google News– Feedburner

• Collected as RSS feeds

Page 10: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Why does this work?

• Media panic/interest leads to widespread reporting of any avian flu story.

• Use of medical blogs like PROMED also helps overcome government restrictions on reporting.

Pictures: ianstacey, quiplash, Incessantflux CCbyA

Page 11: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What is RSS?

• Really Simple Syndication

• RDF Site Summary• A standardized XML

file for passing information about web log (blog) updates.

• You normally view RSS feeds in a feed reader

• We wrote programs to read for us.

Page 12: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Finding the geography

• Step one extract the place names, named entity extraction– Custom tools– Reuters’ Calais system– MetaCarta – GeoNames.org

• GeoCode the places, disambiguate London, Washington etc– Custom tools– MetaCarta– GeoNames.org

Page 13: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Well that can’t be too hard?

Page 14: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Web Mapping Server

• Open Web Mapping Standards from the OGC (allows others to use our data).

• Open Source tools (we’re a poor university).

• Store the data points and news text in PostGIS (free spatial database).

• GeoServer to serve maps from the DB to web (and desktop) clients.

Page 15: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Mapping Client

• Remember our end users are epidemiologists not GIS users so stick with a web browser as client.

• OpenLayers (www.openlayers.org)– JavaScript library that implements the

OGC WMS and WFS standards our server uses.

– Allows rapid construction of an interactive web map by relative novice developers.

– The finished map looks a lot like a Google map so users can use it easily.

Page 16: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

The Map

Choice of background layersChoice of feeds

http://www.experimental.geovista.psu.edu/andrew/html/avian_influenza_map.html

Page 17: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Zoom and Pan

Page 18: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Time Line

• We are also interested in change over time.

• Added SIMILE Timeline from MIT– JavaScript tool allows user to scroll

through time or date stamped information

Page 19: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Link to external pages

Page 20: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Query the map

Page 21: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Did it work?

• Yes,• Well mostly, • Well some of the time!• We can take news feeds, geocode

them and draw maps in a web browser.

Page 22: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What didn’t work?

• News sources and even medical feeds contain too many items that are about avian flu in a general sense but not actually about an outbreak.– Conferences about avian flu– Vaccine news– Reports of other influenza outbreaks– Reports of other infectious diseases

(“unlike avian flu…”

Page 23: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What will we do next?

• Improved selection of RSS items• Bayesian classifier

– Train on a selection of “good” and “bad” items

– Allow user to rate articles

• Non-negative matrix factorization– Clusters similar items based on word

usage– Help overcome repeated reports

Page 24: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

What will we do next?

• Continue to improve the GeoCoder– Better disambiguation algorithms.– Allow user to rate the accuracy of

locations found in reports.

• Improve User Interface– Better selection of points of interest

using timeline – Replace SIMILE with custom time bar

Page 25: A Web Based Tool For the Detection and Analysis of AvianInfluenza Outbreaks From Internet News Sources

Conclusions

• It is possible to construct an online automated system that can read news articles from professional and general news feeds and map them in a way that allows experts and members of the public to track the spread of avian flu outbreaks.

• There is still much work that can be carried out to improve this work.