i want to be a data dj!

54
Paul Groth | Vrije Universiteit Amsterdam | [email protected] Image: http://www.flickr.com/photos/tomk32/2988993409/ All images are under a creative commons license 1

Upload: paul-groth

Post on 06-May-2015

777 views

Category:

Technology


0 download

DESCRIPTION

This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk touches on a variety of topics including linked data, scientific workflows, provenance, enterprise mashups and Facebook. It draws these topics into a unified research framework and discusses future research directions.

TRANSCRIPT

Page 1: I want to be a Data DJ!

Paul Groth | Vrije Universiteit Amsterdam | [email protected]

Image: http://www.flickr.com/photos/tomk32/2988993409/All images are under a creative commons license1

Page 2: I want to be a Data DJ!

2Image: http://www.flickr.com/photos/lyza/2487848260/sizes/l/

Page 3: I want to be a Data DJ!

3Image: http://www.flickr.com/photos/gigi_murru/2757085392/sizes/l

Page 4: I want to be a Data DJ!

4

Page 5: I want to be a Data DJ!

Set of component technologies that can be combined and recombined to create new innovations

From Hal Varian http://people.ischool.berkeley.edu/~hal/ Chief Economist at Google

5

3. http://www.flickr.com/photos/oskay/1364146497/sizes/m/

2. http://www.flickr.com/photos/cwalker71/1041784395/sizes/l/

1. http://www.flickr.com/photos/restlessglobetrotter/448362507/sizes/m/

1 2 3

Page 6: I want to be a Data DJ!

TCP/IP, XML, HTTP, Standard Libraries

1469 Web APIs http://www.programmableweb.com/

6

Page 7: I want to be a Data DJ!

7

Image: http://www.flickr.com/photos/davestfu/2157396025/sizes/l/

Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/

Page 8: I want to be a Data DJ!

8

By 2012, 90M end user programmers in the US alone

13M would describe themselves as programmers55M will use spreadsheets and databases[Scaffidi et al 06]

Page 9: I want to be a Data DJ!

9

We have gone from dozens of markets of millions of users to millions of markets of dozens of users

[Adams 08]The “long tail of programming” [Anderson 08]

Page 10: I want to be a Data DJ!

10

1. Records

2. Turntables and mixers

3. Recording equipment

Page 11: I want to be a Data DJ!

11

Image: http://www.flickr.com/photos/melodramababs/2446537799/sizes/l/

Page 12: I want to be a Data DJ!

Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the

Semantic Web RDFa, OpenCalais, more “other

structured data”

12

Page 13: I want to be a Data DJ!

Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the

Semantic Web RDFa, OpenCalais, more “other

structured data”

13

No more conversio

ncompone

nts

No more conversio

ncompone

nts

Page 14: I want to be a Data DJ!

Shared Techniques

[WIKIAI’09 @IJCAI]

Page 15: I want to be a Data DJ!

Open Task RepositoryOpen Task Repository

Page 16: I want to be a Data DJ!

16

Page 17: I want to be a Data DJ!

http://www.like.nu 57 endpoints Over 1 billion triples

Billion Triple Challenge Dataset We made it available on Amazon EC2

▪ See http://bit.ly/13FOWT Built on eRDF

evolutionary algorithm for searching over triples

Christophe Guéret and Stefan Schlobach17

Page 18: I want to be a Data DJ!

18

Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/

Page 19: I want to be a Data DJ!

Declaratively capture analysis steps and their dependencies

Steps are represented by components Software programs (codes), Web services,

… Workflow systems enable the

creation, editing and management of workflows and their executions Wings/Pegasus, Taverna, Yahoo Pipes,

VizTrails, Kepler, …

19

Page 20: I want to be a Data DJ!

7/10/09 SWF 2009

Page 21: I want to be a Data DJ!

Visual programming not inherently superior Green, Nardi, Moher

End-user development See Nardi, Repenning and Loannidou,

Myers, Lieberman,...Workflows are end-user development

environments

Page 22: I want to be a Data DJ!

Title: BLASTP with simplified results returned Description: This workflow Performs a blastp search on protein sequence, extracts sequence id within the blast report and retrives the corresponding seuqences.[sic]

Page 23: I want to be a Data DJ!

23

- myexperiment.org- 2300 users- 750 workflows- 160 groups

Page 24: I want to be a Data DJ!
Page 25: I want to be a Data DJ!

[IUI’09][AAAI SS 09][SWF 2009]

Page 26: I want to be a Data DJ!

High level templatesAdapt to availability

of components and data

Use rich descriptions

26[e-science 09]

Page 27: I want to be a Data DJ!

Data (triples

)

Data (triples

)

How were they

produced?

How were they

produced?

Which ones

should I trust?

Which ones

should I trust?

Who’s responsible?

Who’s responsible?

From Chris Bizer

From pipes.deri.org

Page 28: I want to be a Data DJ!

Enterprises need to know where and how their data was produced. Uptime Compliance to regulations Quality assurance Performance improvements ….

28

http://www.ifixit.com/Teardown/iPod-touch-3rd-Generation/1158/1

Page 29: I want to be a Data DJ!

29

Image: http://www.flickr.com/photos/seidsvag/122718624/sizes/l/

Page 30: I want to be a Data DJ!

30

Oxford English Dictionary: the fact of coming from some particular source or quarter;

origin, derivation

the history or pedigree of a work of art, manuscript, rare book, etc.;

concretely, a record of the ultimate derivation and passage of an item through its various owners.

Page 31: I want to be a Data DJ!

Computer representation of provenance

• Provenance is represented by documentation

• Provenance is a query answered by searching over documentation

Page 32: I want to be a Data DJ!

1. Instrument & Collect2. Collate3. Query4. Use

32

Page 33: I want to be a Data DJ!

Ensure high-quality characteristics

Protocol for recording documentation

Formalised as an abstract state machine

Proofs to ensure these characteristics

[IEEE TPDS Groth 08]

Page 34: I want to be a Data DJ!

Common logical structure shared by all creating and querying actors

Enables the autonomous, asynchronous production of documentation by different application components

Open, extensible model XML + OWL serializations

Tools can operate on it (e.g. visualisation, reasoning)

[ACM Toit 08: Groth, Moreau, Miles]

Page 35: I want to be a Data DJ!

[e-Science 08]

Page 36: I want to be a Data DJ!

36

from esaw09

Page 37: I want to be a Data DJ!

http://www.flickr.com/photos/newbirth/2834643961/

Page 38: I want to be a Data DJ!
Page 39: I want to be a Data DJ!

ReputationReputation

Page 40: I want to be a Data DJ!

http://www.flickr.com/photos/el_ramon/3804532661/

Page 41: I want to be a Data DJ!

ContentContent

htt

p:/

/ww

w.fl

ickr.

com

/ph

oto

s/ogco

des/

20

9505

4686

/

Page 42: I want to be a Data DJ!

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Page 43: I want to be a Data DJ!

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

Page 44: I want to be a Data DJ!

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

A particula

r stateme

nt is present

A particula

r stateme

nt is present

Page 45: I want to be a Data DJ!

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

≈A

particular

statement is

present

A particula

r stateme

nt is present

Page 46: I want to be a Data DJ!

Works well in open systemsDeals with unknown agents

Make a decision without reputation information

Deals with new agents No need for a system designer to

analyse new entrants Deals with agents behaving

unexpectedly

Page 47: I want to be a Data DJ!

System for gaining experience about contracts (provenance)

Algorithms for assessment of contract proposals based on prior experience

Page 48: I want to be a Data DJ!

Use prior experience based on provenance to ascertain trust

48

[esaw 09]

Page 49: I want to be a Data DJ!

Use prior experience based on provenance to ascertain trust

49

Trust of new

workflow compone

nts

Trust of new

workflow compone

nts

Page 50: I want to be a Data DJ!

50

The Community

http://www.flickr.com/photos/dunechaser/142079357/sizes/o/

Page 51: I want to be a Data DJ!

Provenance Challenges Testing the interoperability of provenance

systems 14 Teams – 3 Challenges http://twiki.ipaw.info/bin/view/Challenge/

Open Provenance Model Interoperability model for exchanging

provenance http://twiki.ipaw.info/bin/view/OPM/

The W3C Provenance Incubator Group http://www.w3.org/2005/Incubator/prov/

51

Page 52: I want to be a Data DJ!

Remixing = combinatorial innovationMake remixing easier

52

1. Data and Data Discovery

2. Component exposure and composition

3. Process capture and organization

Page 53: I want to be a Data DJ!

Contact: [email protected]: http://twitter.com/pgrothRead: http://www.pgroth.com

Page 54: I want to be a Data DJ!

1. Make syntactic errors hard2. Make syntactic errors impossible3. Use objects as language elements4. Make domain-oriented languages5. Meta-domain orientation6. Support incremental development7. Facilitate decomposable test units8. Multiple views and incremental disclosure9. Integrate with the web10. Encourage syntonicity11. Allow immersion12. Scaffold typical designs13. Community tools