Download - Lasi datawrangling
Data wrangling with open source tools
Tony HirstDept of Communication & Systems
The Open University, UK
Premises
“I take data from wherever I
can get it”1
“Appropriate everything”
2
Conversations with data
3
Visual Conversations
with data3
(Accession Plot)
@mediaczar
If a picture’s worth a thousand words,
maybe it should take as long to read?
Most learning analytics won’t be
performed by learning analytics
researchers
How can we help people fashion
their own tools to support data
conversations?
Recipes
site:open.ac.uk
Have a conversation
with the data…
Ask the right questions…
xkcd.com/1138
Sometimes a question makes most sense in
the context of questions previously asked and answers previously received
DATAU
SERS
EducatorsLearners
PlannersMarketers
PolicymakersResearchers
PressNGOs
“DEVELOPERS”
Have dashboard,
so what?
A tools and issues
based view
DATA
TOOLS
USERS
PROBLEMS
Example – Google Fusion Tables
Fusion Tablehttps://www.google.com/fusiontables/DataSource?
docid=1VKG7iCbFlsEYJzTuQppf4xoIqq1ABxWTdW6O_7o#rows:id=1
http://is.gd/qhuaoA
Walkthroughhttp://blog.ouseful.info/2012/11/16/a-quick-look-at-gcsealevel-certificate-awards-market-share-by-examination-
board/
http://is.gd/f9YAbG
DATA
TOOLS
USERS
PROBLEMS
Access/obtain data
Make sense of data
Ask specific questions of data
Communicate in a data-centric way
Load dataClean data
Merge/enrich data
DATA
Issues
TOOLS
DATA
OtherTOOLS
Issues
TOOLS
“Tool based programming”
A barrier to access (for the tool user) is
data format
JSON XMLCSVXLS
TSV
.dbHTML
PDF DOCTXT
GLUE LOGIC(Glue code)
=importHTML(URL, “table”, N)
HTML
QUERYABLE DATA
Try it…Example Page
http://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_the_United_States_by_endowment
http://is.gd/7Vbg6n
Google Spreadsheets as a database
Explorerhttps://views.scraperwiki.com/run/google_spreadsheet_query/
http://is.gd/jiMJoh
Walkthroughhttp://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/
http://is.gd/qJHihu
=importCSV(URL, N)
HTML
INTERACTIVEDASHBOARD
Google Charts
Google Chart Visualization API
https://code.google.com/apis/ajax/playground/
http://is.gd/TTHIUh
Google Visualisation
API
googleVis (R)
https://developers.facebook.com/docs/reference/api/
examples/
http://is.gd/7cRnvS
A barrier to access (for the tool user) is
data shape
nother
A barrier to access (for the tool user) is
data cleanliness
nother
Yet
Clear to read?
Questions of identity
The Open UniversityOpen University
OUOpen Uni
Open University, UK
NORMALISATION/RECONCILIATION
Reconciliation to a canonical name
and/or to a unique identifier
A stumbling block (for the data user) is data enrichment
A stumbling block (for the data user) is joining datasets
nother
A stumbling block (for the data user) is joining partially
matched data
huge
Rolling your own interactive data
exploration tools
R Shiny Apps
ui.R server.R
RCharts
Many chart tools do the work for
you if the data is in the right shape
DATA
TOOLS
USERS PROBLEMS
Just
ask
…
ask.SchoolOfData.org
blog.ouseful.info
@psychemedia