new information for new journalists pt2: data

Post on 07-May-2015

1.785 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation to ESCACC, Barcelona, 2010

TRANSCRIPT

IntroductionPaul Bradshaw

Data journalism

Ivy Lee

“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”

Adrian Holovaty

Great storiesEngagementTargeting/relevance

Why?

“The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.”

http://bit.ly/dj2dmz

Times film genres

Data Journalism Continuum

1. Finding data

What is data?

NumbersTextConnectionsLive dataBehavioural dataImages, audio, video

Anything that a computer can work with

Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?

Passive vs active data journalism

Data.gov.ukWhat Do They KnowOpenlylocal, Scraperwiki

Disclosure logsRSS feeds, XML, structured data

Some UK projects

Delicious.com/paulb/car

CAR

Advanced search by file type

“Performance figures” Filetype: pdfFiletype: xlsFiletype: docFiletype: pptFiletype: rdf OR xml

Advanced search by domain

“Disclosure logs” site: .gov.esDatabase site: .org.cat OR .org+Tables –chairs site:Health, police, military domains

Use overseas sources

• US medicine databases• EU subsidy databases • Swedish people data• International police agency

correspondence

Scraping

Scraping can automate & schedule the gathering process if there are multiple sourcesTools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae

Interrogating data

Humans collect dataHumans enter dataHuman error

Time spent now...

Different words for the same thingDouble spaces, punctuationWrong data typeMistypedDuplicate entriesDefault entries (1/1/00)

...Saves time later

"Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do."

David Donald, Center for Public Integrity

Group by term then sort to see duplicationsFind & replace double spaces, etc. Select column/row & check data typeSort to find unusually large/small, and neighbouring misspellings

Cleaning methods

Never publish a name from data without  running a background check

Check.

Other tools

Freebase Gridworks: see http://vimeo.com/10081183

Visualising data

or http://chartchooser.juiceanalytics.com/

(trends, dips, correlations)

(comparison, themes)

(proportions, comparison)

Mashing data

Geocoded data with map- Live data (e.g. Twitter API)- Static data (e.g. Google Docs)- Dynamic data (e.g. Google Form)2 spreadsheets with common data- Tools: MySQL, Access, etc.

Combining data sources

TwittermapWikipedia mapNYT PropertyGuardian vs NatureBBC Most ReadBBC Olympic Village

Combining data sources

Big events (protests, Olympics, inauguration)ComparisonsGeocoded dataConnections

What mashes well?

AggregatesMapsFiltersCountsCleans or reformats (regex)

Yahoo! Pipes

Scraperwiki – mapping libraryMaptube – combine mapsGoogle Docs – publish in different formats+++

Other tools

Computer-readable dataParis – France, Texas, or Hilton?Unique identifiers – usually URIRDF, RDFa, XML, etc.

Semantic web & linked data

Application Programming InterfaceBuild on top of dataGoogle Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc.

API

Slideshare.net/onlinejournalistTwitter.com/paulbradshaw

Q&A

Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/statistics

Bookmarks

top related