the art of acquiring data
TRANSCRIPT
The Art of Acquiring Data
Hunting for Data
Data sources
● Public agencies (local, county, state, federal)● Data.gov sites● Social networking sites (often APIs)
● Nonprofits/industry experts● Academic institutions● Manually gathered
Databases of Databases
● Paid○ Accurint ($)○ Nexis ($)
● Free○ BRB○ Online Searches○ Libraries
Unknown Unknowns
Not everything is on the web. I swear.
A universe of data never sees the light of day on the Web. How do you find it?● Seek ye the nerds● Interview gov employees● Academics, experts can shine light or
provide custom data they compiled
If agency officials won’t helpFollow the bread crumbs:● Gov forms● Public contracts (esp. for vendor software)● Software manuals● Don’t forget about those academics/experts!
Friendly FOIAs
● Negotiate data with officials● Craft targeted request● Send FOIA, if at all, as a formality
Not-so-friendly FOIAs
● Negotiate first (see Friendly FOIAs)● Know your rights
○ response deadlines○ legit exemptions
● Seek expert advice (CalAware, CFAC)● Follow through on requests
We have data! Let’s start writing!
Dimensions of Data
Identity
● What do the fields mean? (ask for a data dictionary)
● What are the data types in each column?● Missing data? Dupes? Absurd values? Other
mistakes?
Provenance
What is the origin story and chain of custody for your data?● Hand-keyed from gov forms?● “self reported” using web form?● Generated by automated system?● What data validations exist?● Data dump or output from reporting system?
Context
● What rules and regs surround the data?● How comprehensive is the data? ● Other overlapping data sets?● Other complementary data sets?
En Fin
● Data is lurking online and off.● More (data) bees with honey.● Don’t just get the data. Know the data.
Ping me.
Serdar Tumgoren@[email protected]://www.slideshare.net/serdartumgoren