i want to be a data dj!
DESCRIPTION
This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk touches on a variety of topics including linked data, scientific workflows, provenance, enterprise mashups and Facebook. It draws these topics into a unified research framework and discusses future research directions.TRANSCRIPT
Paul Groth | Vrije Universiteit Amsterdam | [email protected]
Image: http://www.flickr.com/photos/tomk32/2988993409/All images are under a creative commons license1
2Image: http://www.flickr.com/photos/lyza/2487848260/sizes/l/
3Image: http://www.flickr.com/photos/gigi_murru/2757085392/sizes/l
4
Set of component technologies that can be combined and recombined to create new innovations
From Hal Varian http://people.ischool.berkeley.edu/~hal/ Chief Economist at Google
5
3. http://www.flickr.com/photos/oskay/1364146497/sizes/m/
2. http://www.flickr.com/photos/cwalker71/1041784395/sizes/l/
1. http://www.flickr.com/photos/restlessglobetrotter/448362507/sizes/m/
1 2 3
TCP/IP, XML, HTTP, Standard Libraries
1469 Web APIs http://www.programmableweb.com/
6
7
Image: http://www.flickr.com/photos/davestfu/2157396025/sizes/l/
Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
8
By 2012, 90M end user programmers in the US alone
13M would describe themselves as programmers55M will use spreadsheets and databases[Scaffidi et al 06]
9
We have gone from dozens of markets of millions of users to millions of markets of dozens of users
[Adams 08]The “long tail of programming” [Anderson 08]
10
1. Records
2. Turntables and mixers
3. Recording equipment
11
Image: http://www.flickr.com/photos/melodramababs/2446537799/sizes/l/
Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the
Semantic Web RDFa, OpenCalais, more “other
structured data”
12
Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the
Semantic Web RDFa, OpenCalais, more “other
structured data”
13
No more conversio
ncompone
nts
No more conversio
ncompone
nts
Shared Techniques
[WIKIAI’09 @IJCAI]
Open Task RepositoryOpen Task Repository
16
http://www.like.nu 57 endpoints Over 1 billion triples
Billion Triple Challenge Dataset We made it available on Amazon EC2
▪ See http://bit.ly/13FOWT Built on eRDF
evolutionary algorithm for searching over triples
Christophe Guéret and Stefan Schlobach17
18
Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
Declaratively capture analysis steps and their dependencies
Steps are represented by components Software programs (codes), Web services,
… Workflow systems enable the
creation, editing and management of workflows and their executions Wings/Pegasus, Taverna, Yahoo Pipes,
VizTrails, Kepler, …
19
7/10/09 SWF 2009
Visual programming not inherently superior Green, Nardi, Moher
End-user development See Nardi, Repenning and Loannidou,
Myers, Lieberman,...Workflows are end-user development
environments
Title: BLASTP with simplified results returned Description: This workflow Performs a blastp search on protein sequence, extracts sequence id within the blast report and retrives the corresponding seuqences.[sic]
≅
23
- myexperiment.org- 2300 users- 750 workflows- 160 groups
[IUI’09][AAAI SS 09][SWF 2009]
High level templatesAdapt to availability
of components and data
Use rich descriptions
26[e-science 09]
Data (triples
)
Data (triples
)
How were they
produced?
How were they
produced?
Which ones
should I trust?
Which ones
should I trust?
Who’s responsible?
Who’s responsible?
From Chris Bizer
From pipes.deri.org
Enterprises need to know where and how their data was produced. Uptime Compliance to regulations Quality assurance Performance improvements ….
28
http://www.ifixit.com/Teardown/iPod-touch-3rd-Generation/1158/1
29
Image: http://www.flickr.com/photos/seidsvag/122718624/sizes/l/
30
Oxford English Dictionary: the fact of coming from some particular source or quarter;
origin, derivation
the history or pedigree of a work of art, manuscript, rare book, etc.;
concretely, a record of the ultimate derivation and passage of an item through its various owners.
Computer representation of provenance
• Provenance is represented by documentation
• Provenance is a query answered by searching over documentation
1. Instrument & Collect2. Collate3. Query4. Use
32
Ensure high-quality characteristics
Protocol for recording documentation
Formalised as an abstract state machine
Proofs to ensure these characteristics
[IEEE TPDS Groth 08]
Common logical structure shared by all creating and querying actors
Enables the autonomous, asynchronous production of documentation by different application components
Open, extensible model XML + OWL serializations
Tools can operate on it (e.g. visualisation, reasoning)
[ACM Toit 08: Groth, Moreau, Miles]
[e-Science 08]
36
from esaw09
http://www.flickr.com/photos/newbirth/2834643961/
ReputationReputation
http://www.flickr.com/photos/el_ramon/3804532661/
ContentContent
htt
p:/
/ww
w.fl
ickr.
com
/ph
oto
s/ogco
des/
20
9505
4686
/
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
A particula
r stateme
nt is present
A particula
r stateme
nt is present
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
≈A
particular
statement is
present
A particula
r stateme
nt is present
Works well in open systemsDeals with unknown agents
Make a decision without reputation information
Deals with new agents No need for a system designer to
analyse new entrants Deals with agents behaving
unexpectedly
System for gaining experience about contracts (provenance)
Algorithms for assessment of contract proposals based on prior experience
Use prior experience based on provenance to ascertain trust
48
[esaw 09]
Use prior experience based on provenance to ascertain trust
49
Trust of new
workflow compone
nts
Trust of new
workflow compone
nts
50
The Community
http://www.flickr.com/photos/dunechaser/142079357/sizes/o/
Provenance Challenges Testing the interoperability of provenance
systems 14 Teams – 3 Challenges http://twiki.ipaw.info/bin/view/Challenge/
Open Provenance Model Interoperability model for exchanging
provenance http://twiki.ipaw.info/bin/view/OPM/
The W3C Provenance Incubator Group http://www.w3.org/2005/Incubator/prov/
51
Remixing = combinatorial innovationMake remixing easier
52
1. Data and Data Discovery
2. Component exposure and composition
3. Process capture and organization
Contact: [email protected]: http://twitter.com/pgrothRead: http://www.pgroth.com
1. Make syntactic errors hard2. Make syntactic errors impossible3. Use objects as language elements4. Make domain-oriented languages5. Meta-domain orientation6. Support incremental development7. Facilitate decomposable test units8. Multiple views and incremental disclosure9. Integrate with the web10. Encourage syntonicity11. Allow immersion12. Scaffold typical designs13. Community tools