making the gov data more open
DESCRIPTION
http://spring2011.drupalcamp.se/schedule/making-government-data-open-drupal-and-other-toolsTRANSCRIPT
M A K I N G T H E G O V D A T A O P E NM A R E K S O T A K | A T O M I C A N T
w w w . a t o m i c a n t . c o . u k
may 2 0 1 1
O H H A I !A B O U T M E & A T O M I C A N T
a t o m i c a n t . c o . u k
Marek Sotak• Web designer, developer• From Prague, Czech Republic• Over 5 years with Drupal - since v4.6• Rootcandy admin theme• Organising events - Drupal Design Camp, Local Meet-ups
• @sotak on twitter• http://sotak.co.uk - personal blog/experiments
6 : 0 2 : 1
#justsaying ;)
O H H A I !A B O U T M E & A T O M I C A N T
• Based in London & Prague• Human interface design, training, branding, development• Clients all over the world• http://atomicant.co.uk
O P E N D A T A ?H U H ?
a t o m i c a n t . c o . u k
Wikileaks Iraq war logs: every death mapped http://bit.ly/iraqwarlogs
O P E N D A T A ?H U H ?
a t o m i c a n t . c o . u k
Don't eat at ____ http://donteat.at
D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y
a t o m i c a n t . c o . u k
BigClean.org – Prague
D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y
a t o m i c a n t . c o . u k
There's a lot of data laying around on the internet that can be useful → Crime reports, government reports, statistics, missing pets register, current affairs
However sometimes they are in a format such as PDF, html, etc... something you can't really take and perform calculations, visualizations, filtering, etc... on.
Is it really that hard to publish something in a CSV, XML,.. ?
D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y
a t o m i c a n t . c o . u k
Ministry of the interior – Czech RepublicPublic Collections - open what?
D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y
a t o m i c a n t . c o . u k
Request a site/content
Run through the html – DOM - selectors
Do whatever you want with the data
Save the data
S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T
a t o m i c a n t . c o . u k
Scrape and link data using Ruby, Python and PHP scripts that run maintenance-free in the cloud. Request data for scoops and better decisions.
S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T
a t o m i c a n t . c o . u k
Why would you want to use SCRAPERWIKI rather than other scraping tools or custom code?
S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T
a t o m i c a n t . c o . u k
• The dataset is available to everyone• Anyone can access the data through API• If the source changed and the scraper brakes, anyone can
fix the scraper• Anyone can fork the scraper
G O O G L E R E F I N EW H A T I S I T ? H O W T O U S E I T
a t o m i c a n t . c o . u k
Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services,...
V I S U A L I S ET E L L T H E S T O R Y
a t o m i c a n t . c o . u k
There is more to that
It's just not data with values in a spreadsheet or database
Data can tell the story!
G O O G L E F U S I O N T A B L E SW H A T I S I T ? H O W T O U S E I T
a t o m i c a n t . c o . u k
Easy visualisation http://tables.googlelabs.com/
S C R A P I N G W I T H D R U P A LA N D N O W F O R S O M E T H I N G C O M P L E T E L Y D I F F E R E N T
a t o m i c a n t . c o . u k
Feeds – http://drupal.org/project/feeds
ScrapingFeeds query path parser - project/feeds_querypath_parserFeeds xpath parser – project/feeds_xpathparser
Cleaning up dataFeeds tamper - http://drupal.org/project/feeds_tamper
V I S U A L I S E W I T H D R U P A LA N D N O W F O R S O M E T H I N G C O M P L E T E L Y D I F F E R E N T
a t o m i c a n t . c o . u k
Mapping - Location – http://drupal.org/project/location - Openlayers – http://drupal.org/project/openlayers - Gmap – http://drupal.org/project/gmap
Graphs/Charts- Graphs- Graphs Charts- Open Flash Chart- Views
G O ! S C R A P E I T !C H A L L E N G E
a t o m i c a n t . c o . u k
EU Open Data Challenge - €20,000 to win - 28 days left to enter
http://opendatachallenge.org/
T O O L SS C R A P I N G D A T A
a t o m i c a n t . c o . u k
ScraperWiki – http://scraperwiki.com
PHP Simple HTML DOM – http://bit.ly/phphtmldom
PHPQuery - http://code.google.com/p/phpquery/
Open Data Kit - http://opendatakit.org/
T O O L SC L E A N I N G D A T A
a t o m i c a n t . c o . u k
Google Refine - http://code.google.com/p/google-refine/
T O O L SV I S U A L I Z I N G D A T A
a t o m i c a n t . c o . u k
Google fusion tables - http://tables.googlelabs.com/
The Best Tools for Visualization - http://rww.to/toolsforvis
T O O L SV I S U A L I Z I N G D A T A
a t o m i c a n t . c o . u k
OpenHeatmap http://bit.ly/openheatmap
T H A N K Y O UQ & A | L E T S C O N N E C T
a t o m i c a n t . c o . u k
QUESTIONS?
@sotak - twitterhttp://sotak.co.uk - personal bloghttp://atomicant.co.uk - company website