Download - I want to be a Data DJ!
![Page 1: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/1.jpg)
Paul Groth | Vrije Universiteit Amsterdam | [email protected]
Image: http://www.flickr.com/photos/tomk32/2988993409/All images are under a creative commons license1
![Page 2: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/2.jpg)
2Image: http://www.flickr.com/photos/lyza/2487848260/sizes/l/
![Page 3: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/3.jpg)
3Image: http://www.flickr.com/photos/gigi_murru/2757085392/sizes/l
![Page 4: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/4.jpg)
4
![Page 5: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/5.jpg)
Set of component technologies that can be combined and recombined to create new innovations
From Hal Varian http://people.ischool.berkeley.edu/~hal/ Chief Economist at Google
5
3. http://www.flickr.com/photos/oskay/1364146497/sizes/m/
2. http://www.flickr.com/photos/cwalker71/1041784395/sizes/l/
1. http://www.flickr.com/photos/restlessglobetrotter/448362507/sizes/m/
1 2 3
![Page 6: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/6.jpg)
TCP/IP, XML, HTTP, Standard Libraries
1469 Web APIs http://www.programmableweb.com/
6
![Page 7: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/7.jpg)
7
Image: http://www.flickr.com/photos/davestfu/2157396025/sizes/l/
Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
![Page 8: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/8.jpg)
8
By 2012, 90M end user programmers in the US alone
13M would describe themselves as programmers55M will use spreadsheets and databases[Scaffidi et al 06]
![Page 9: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/9.jpg)
9
We have gone from dozens of markets of millions of users to millions of markets of dozens of users
[Adams 08]The “long tail of programming” [Anderson 08]
![Page 10: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/10.jpg)
10
1. Records
2. Turntables and mixers
3. Recording equipment
![Page 11: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/11.jpg)
11
Image: http://www.flickr.com/photos/melodramababs/2446537799/sizes/l/
![Page 12: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/12.jpg)
Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the
Semantic Web RDFa, OpenCalais, more “other
structured data”
12
![Page 13: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/13.jpg)
Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the
Semantic Web RDFa, OpenCalais, more “other
structured data”
13
No more conversio
ncompone
nts
No more conversio
ncompone
nts
![Page 14: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/14.jpg)
Shared Techniques
[WIKIAI’09 @IJCAI]
![Page 15: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/15.jpg)
Open Task RepositoryOpen Task Repository
![Page 16: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/16.jpg)
16
![Page 17: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/17.jpg)
http://www.like.nu 57 endpoints Over 1 billion triples
Billion Triple Challenge Dataset We made it available on Amazon EC2
▪ See http://bit.ly/13FOWT Built on eRDF
evolutionary algorithm for searching over triples
Christophe Guéret and Stefan Schlobach17
![Page 18: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/18.jpg)
18
Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
![Page 19: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/19.jpg)
Declaratively capture analysis steps and their dependencies
Steps are represented by components Software programs (codes), Web services,
… Workflow systems enable the
creation, editing and management of workflows and their executions Wings/Pegasus, Taverna, Yahoo Pipes,
VizTrails, Kepler, …
19
![Page 20: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/20.jpg)
7/10/09 SWF 2009
![Page 21: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/21.jpg)
Visual programming not inherently superior Green, Nardi, Moher
End-user development See Nardi, Repenning and Loannidou,
Myers, Lieberman,...Workflows are end-user development
environments
![Page 22: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/22.jpg)
Title: BLASTP with simplified results returned Description: This workflow Performs a blastp search on protein sequence, extracts sequence id within the blast report and retrives the corresponding seuqences.[sic]
≅
![Page 23: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/23.jpg)
23
- myexperiment.org- 2300 users- 750 workflows- 160 groups
![Page 24: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/24.jpg)
![Page 25: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/25.jpg)
[IUI’09][AAAI SS 09][SWF 2009]
![Page 26: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/26.jpg)
High level templatesAdapt to availability
of components and data
Use rich descriptions
26[e-science 09]
![Page 27: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/27.jpg)
Data (triples
)
Data (triples
)
How were they
produced?
How were they
produced?
Which ones
should I trust?
Which ones
should I trust?
Who’s responsible?
Who’s responsible?
From Chris Bizer
From pipes.deri.org
![Page 28: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/28.jpg)
Enterprises need to know where and how their data was produced. Uptime Compliance to regulations Quality assurance Performance improvements ….
28
http://www.ifixit.com/Teardown/iPod-touch-3rd-Generation/1158/1
![Page 29: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/29.jpg)
29
Image: http://www.flickr.com/photos/seidsvag/122718624/sizes/l/
![Page 30: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/30.jpg)
30
Oxford English Dictionary: the fact of coming from some particular source or quarter;
origin, derivation
the history or pedigree of a work of art, manuscript, rare book, etc.;
concretely, a record of the ultimate derivation and passage of an item through its various owners.
![Page 31: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/31.jpg)
Computer representation of provenance
• Provenance is represented by documentation
• Provenance is a query answered by searching over documentation
![Page 32: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/32.jpg)
1. Instrument & Collect2. Collate3. Query4. Use
32
![Page 33: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/33.jpg)
Ensure high-quality characteristics
Protocol for recording documentation
Formalised as an abstract state machine
Proofs to ensure these characteristics
[IEEE TPDS Groth 08]
![Page 34: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/34.jpg)
Common logical structure shared by all creating and querying actors
Enables the autonomous, asynchronous production of documentation by different application components
Open, extensible model XML + OWL serializations
Tools can operate on it (e.g. visualisation, reasoning)
[ACM Toit 08: Groth, Moreau, Miles]
![Page 35: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/35.jpg)
[e-Science 08]
![Page 36: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/36.jpg)
36
from esaw09
![Page 37: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/37.jpg)
http://www.flickr.com/photos/newbirth/2834643961/
![Page 38: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/38.jpg)
![Page 39: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/39.jpg)
ReputationReputation
![Page 40: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/40.jpg)
http://www.flickr.com/photos/el_ramon/3804532661/
![Page 41: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/41.jpg)
ContentContent
htt
p:/
/ww
w.fl
ickr.
com
/ph
oto
s/ogco
des/
20
9505
4686
/
![Page 42: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/42.jpg)
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
![Page 43: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/43.jpg)
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
![Page 44: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/44.jpg)
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
A particula
r stateme
nt is present
A particula
r stateme
nt is present
![Page 45: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/45.jpg)
ContentContent Nice Letterhe
ad
Nice Letterhe
ad
Official Seal
Official Seal
≈A
particular
statement is
present
A particula
r stateme
nt is present
![Page 46: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/46.jpg)
Works well in open systemsDeals with unknown agents
Make a decision without reputation information
Deals with new agents No need for a system designer to
analyse new entrants Deals with agents behaving
unexpectedly
![Page 47: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/47.jpg)
System for gaining experience about contracts (provenance)
Algorithms for assessment of contract proposals based on prior experience
![Page 48: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/48.jpg)
Use prior experience based on provenance to ascertain trust
48
[esaw 09]
![Page 49: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/49.jpg)
Use prior experience based on provenance to ascertain trust
49
Trust of new
workflow compone
nts
Trust of new
workflow compone
nts
![Page 50: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/50.jpg)
50
The Community
http://www.flickr.com/photos/dunechaser/142079357/sizes/o/
![Page 51: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/51.jpg)
Provenance Challenges Testing the interoperability of provenance
systems 14 Teams – 3 Challenges http://twiki.ipaw.info/bin/view/Challenge/
Open Provenance Model Interoperability model for exchanging
provenance http://twiki.ipaw.info/bin/view/OPM/
The W3C Provenance Incubator Group http://www.w3.org/2005/Incubator/prov/
51
![Page 52: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/52.jpg)
Remixing = combinatorial innovationMake remixing easier
52
1. Data and Data Discovery
2. Component exposure and composition
3. Process capture and organization
![Page 53: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/53.jpg)
Contact: [email protected]: http://twitter.com/pgrothRead: http://www.pgroth.com
![Page 54: I want to be a Data DJ!](https://reader030.vdocuments.net/reader030/viewer/2022012918/5549c9d8b4c9051c778b45bb/html5/thumbnails/54.jpg)
1. Make syntactic errors hard2. Make syntactic errors impossible3. Use objects as language elements4. Make domain-oriented languages5. Meta-domain orientation6. Support incremental development7. Facilitate decomposable test units8. Multiple views and incremental disclosure9. Integrate with the web10. Encourage syntonicity11. Allow immersion12. Scaffold typical designs13. Community tools