new information for new journalists pt2: data

59
Introduction Paul Bradshaw Data journalism

Upload: paul-bradshaw

Post on 07-May-2015

1.785 views

Category:

Education


3 download

DESCRIPTION

Presentation to ESCACC, Barcelona, 2010

TRANSCRIPT

Page 1: New information for new journalists pt2: data

IntroductionPaul Bradshaw

Data journalism

Page 2: New information for new journalists pt2: data

Ivy Lee

Page 3: New information for new journalists pt2: data

“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”

Adrian Holovaty

Page 4: New information for new journalists pt2: data
Page 5: New information for new journalists pt2: data

Great storiesEngagementTargeting/relevance

Why?

Page 6: New information for new journalists pt2: data
Page 7: New information for new journalists pt2: data
Page 8: New information for new journalists pt2: data
Page 9: New information for new journalists pt2: data
Page 10: New information for new journalists pt2: data
Page 11: New information for new journalists pt2: data

“The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.”

http://bit.ly/dj2dmz

Page 12: New information for new journalists pt2: data
Page 13: New information for new journalists pt2: data
Page 14: New information for new journalists pt2: data

Times film genres

Page 15: New information for new journalists pt2: data
Page 16: New information for new journalists pt2: data

Data Journalism Continuum

Page 17: New information for new journalists pt2: data

1. Finding data

Page 18: New information for new journalists pt2: data

What is data?

Page 19: New information for new journalists pt2: data

NumbersTextConnectionsLive dataBehavioural dataImages, audio, video

Anything that a computer can work with

Page 20: New information for new journalists pt2: data
Page 21: New information for new journalists pt2: data

Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?

Passive vs active data journalism

Page 22: New information for new journalists pt2: data

Data.gov.ukWhat Do They KnowOpenlylocal, Scraperwiki

Disclosure logsRSS feeds, XML, structured data

Some UK projects

Page 23: New information for new journalists pt2: data

Delicious.com/paulb/car

CAR

Page 24: New information for new journalists pt2: data

Advanced search by file type

“Performance figures” Filetype: pdfFiletype: xlsFiletype: docFiletype: pptFiletype: rdf OR xml

Page 25: New information for new journalists pt2: data

Advanced search by domain

“Disclosure logs” site: .gov.esDatabase site: .org.cat OR .org+Tables –chairs site:Health, police, military domains

Page 26: New information for new journalists pt2: data

Use overseas sources

• US medicine databases• EU subsidy databases • Swedish people data• International police agency

correspondence

Page 27: New information for new journalists pt2: data

Scraping

Scraping can automate & schedule the gathering process if there are multiple sourcesTools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae

Page 28: New information for new journalists pt2: data

Interrogating data

Page 29: New information for new journalists pt2: data

Humans collect dataHumans enter dataHuman error

Time spent now...

Page 30: New information for new journalists pt2: data

Different words for the same thingDouble spaces, punctuationWrong data typeMistypedDuplicate entriesDefault entries (1/1/00)

...Saves time later

Page 31: New information for new journalists pt2: data

"Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do."

David Donald, Center for Public Integrity

Page 32: New information for new journalists pt2: data

Group by term then sort to see duplicationsFind & replace double spaces, etc. Select column/row & check data typeSort to find unusually large/small, and neighbouring misspellings

Cleaning methods

Page 33: New information for new journalists pt2: data

Never publish a name from data without  running a background check

Check.

Page 34: New information for new journalists pt2: data

Other tools

Freebase Gridworks: see http://vimeo.com/10081183

Page 35: New information for new journalists pt2: data

Visualising data

Page 36: New information for new journalists pt2: data
Page 37: New information for new journalists pt2: data

or http://chartchooser.juiceanalytics.com/

Page 38: New information for new journalists pt2: data
Page 39: New information for new journalists pt2: data

(trends, dips, correlations)

Page 40: New information for new journalists pt2: data
Page 41: New information for new journalists pt2: data

(comparison, themes)

Page 42: New information for new journalists pt2: data

(proportions, comparison)

Page 43: New information for new journalists pt2: data

Mashing data

Page 44: New information for new journalists pt2: data

Geocoded data with map- Live data (e.g. Twitter API)- Static data (e.g. Google Docs)- Dynamic data (e.g. Google Form)2 spreadsheets with common data- Tools: MySQL, Access, etc.

Combining data sources

Page 45: New information for new journalists pt2: data
Page 46: New information for new journalists pt2: data
Page 47: New information for new journalists pt2: data
Page 48: New information for new journalists pt2: data
Page 49: New information for new journalists pt2: data
Page 50: New information for new journalists pt2: data
Page 51: New information for new journalists pt2: data
Page 52: New information for new journalists pt2: data

TwittermapWikipedia mapNYT PropertyGuardian vs NatureBBC Most ReadBBC Olympic Village

Combining data sources

Page 53: New information for new journalists pt2: data

Big events (protests, Olympics, inauguration)ComparisonsGeocoded dataConnections

What mashes well?

Page 54: New information for new journalists pt2: data

AggregatesMapsFiltersCountsCleans or reformats (regex)

Yahoo! Pipes

Page 55: New information for new journalists pt2: data

Scraperwiki – mapping libraryMaptube – combine mapsGoogle Docs – publish in different formats+++

Other tools

Page 56: New information for new journalists pt2: data

Computer-readable dataParis – France, Texas, or Hilton?Unique identifiers – usually URIRDF, RDFa, XML, etc.

Semantic web & linked data

Page 57: New information for new journalists pt2: data

Application Programming InterfaceBuild on top of dataGoogle Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc.

API

Page 58: New information for new journalists pt2: data

Slideshare.net/onlinejournalistTwitter.com/paulbradshaw

Q&A

Page 59: New information for new journalists pt2: data

Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/statistics

Bookmarks