i want to be a data dj!

Post on 06-May-2015

777 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk touches on a variety of topics including linked data, scientific workflows, provenance, enterprise mashups and Facebook. It draws these topics into a unified research framework and discusses future research directions.

TRANSCRIPT

Paul Groth | Vrije Universiteit Amsterdam | pgroth@few.vu.nl

Image: http://www.flickr.com/photos/tomk32/2988993409/All images are under a creative commons license1

2Image: http://www.flickr.com/photos/lyza/2487848260/sizes/l/

3Image: http://www.flickr.com/photos/gigi_murru/2757085392/sizes/l

4

Set of component technologies that can be combined and recombined to create new innovations

From Hal Varian http://people.ischool.berkeley.edu/~hal/ Chief Economist at Google

5

3. http://www.flickr.com/photos/oskay/1364146497/sizes/m/

2. http://www.flickr.com/photos/cwalker71/1041784395/sizes/l/

1. http://www.flickr.com/photos/restlessglobetrotter/448362507/sizes/m/

1 2 3

TCP/IP, XML, HTTP, Standard Libraries

1469 Web APIs http://www.programmableweb.com/

6

7

Image: http://www.flickr.com/photos/davestfu/2157396025/sizes/l/

Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/

8

By 2012, 90M end user programmers in the US alone

13M would describe themselves as programmers55M will use spreadsheets and databases[Scaffidi et al 06]

9

We have gone from dozens of markets of millions of users to millions of markets of dozens of users

[Adams 08]The “long tail of programming” [Anderson 08]

10

1. Records

2. Turntables and mixers

3. Recording equipment

11

Image: http://www.flickr.com/photos/melodramababs/2446537799/sizes/l/

Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the

Semantic Web RDFa, OpenCalais, more “other

structured data”

12

Remixing ++Common, flexible and usable APIsStandard data modelsEmergence of nice toolsConvergence of the Web and the

Semantic Web RDFa, OpenCalais, more “other

structured data”

13

No more conversio

ncompone

nts

No more conversio

ncompone

nts

Shared Techniques

[WIKIAI’09 @IJCAI]

Open Task RepositoryOpen Task Repository

16

http://www.like.nu 57 endpoints Over 1 billion triples

Billion Triple Challenge Dataset We made it available on Amazon EC2

▪ See http://bit.ly/13FOWT Built on eRDF

evolutionary algorithm for searching over triples

Christophe Guéret and Stefan Schlobach17

18

Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/

Declaratively capture analysis steps and their dependencies

Steps are represented by components Software programs (codes), Web services,

… Workflow systems enable the

creation, editing and management of workflows and their executions Wings/Pegasus, Taverna, Yahoo Pipes,

VizTrails, Kepler, …

19

7/10/09 SWF 2009

Visual programming not inherently superior Green, Nardi, Moher

End-user development See Nardi, Repenning and Loannidou,

Myers, Lieberman,...Workflows are end-user development

environments

Title: BLASTP with simplified results returned Description: This workflow Performs a blastp search on protein sequence, extracts sequence id within the blast report and retrives the corresponding seuqences.[sic]

23

- myexperiment.org- 2300 users- 750 workflows- 160 groups

[IUI’09][AAAI SS 09][SWF 2009]

High level templatesAdapt to availability

of components and data

Use rich descriptions

26[e-science 09]

Data (triples

)

Data (triples

)

How were they

produced?

How were they

produced?

Which ones

should I trust?

Which ones

should I trust?

Who’s responsible?

Who’s responsible?

From Chris Bizer

From pipes.deri.org

Enterprises need to know where and how their data was produced. Uptime Compliance to regulations Quality assurance Performance improvements ….

28

http://www.ifixit.com/Teardown/iPod-touch-3rd-Generation/1158/1

29

Image: http://www.flickr.com/photos/seidsvag/122718624/sizes/l/

30

Oxford English Dictionary: the fact of coming from some particular source or quarter;

origin, derivation

the history or pedigree of a work of art, manuscript, rare book, etc.;

concretely, a record of the ultimate derivation and passage of an item through its various owners.

Computer representation of provenance

• Provenance is represented by documentation

• Provenance is a query answered by searching over documentation

1. Instrument & Collect2. Collate3. Query4. Use

32

Ensure high-quality characteristics

Protocol for recording documentation

Formalised as an abstract state machine

Proofs to ensure these characteristics

[IEEE TPDS Groth 08]

Common logical structure shared by all creating and querying actors

Enables the autonomous, asynchronous production of documentation by different application components

Open, extensible model XML + OWL serializations

Tools can operate on it (e.g. visualisation, reasoning)

[ACM Toit 08: Groth, Moreau, Miles]

[e-Science 08]

36

from esaw09

http://www.flickr.com/photos/newbirth/2834643961/

ReputationReputation

http://www.flickr.com/photos/el_ramon/3804532661/

ContentContent

htt

p:/

/ww

w.fl

ickr.

com

/ph

oto

s/ogco

des/

20

9505

4686

/

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

A particula

r stateme

nt is present

A particula

r stateme

nt is present

ContentContent Nice Letterhe

ad

Nice Letterhe

ad

Official Seal

Official Seal

≈A

particular

statement is

present

A particula

r stateme

nt is present

Works well in open systemsDeals with unknown agents

Make a decision without reputation information

Deals with new agents No need for a system designer to

analyse new entrants Deals with agents behaving

unexpectedly

System for gaining experience about contracts (provenance)

Algorithms for assessment of contract proposals based on prior experience

Use prior experience based on provenance to ascertain trust

48

[esaw 09]

Use prior experience based on provenance to ascertain trust

49

Trust of new

workflow compone

nts

Trust of new

workflow compone

nts

50

The Community

http://www.flickr.com/photos/dunechaser/142079357/sizes/o/

Provenance Challenges Testing the interoperability of provenance

systems 14 Teams – 3 Challenges http://twiki.ipaw.info/bin/view/Challenge/

Open Provenance Model Interoperability model for exchanging

provenance http://twiki.ipaw.info/bin/view/OPM/

The W3C Provenance Incubator Group http://www.w3.org/2005/Incubator/prov/

51

Remixing = combinatorial innovationMake remixing easier

52

1. Data and Data Discovery

2. Component exposure and composition

3. Process capture and organization

Contact: pgroth@few.vu.nlFollow: http://twitter.com/pgrothRead: http://www.pgroth.com

1. Make syntactic errors hard2. Make syntactic errors impossible3. Use objects as language elements4. Make domain-oriented languages5. Meta-domain orientation6. Support incremental development7. Facilitate decomposable test units8. Multiple views and incremental disclosure9. Integrate with the web10. Encourage syntonicity11. Allow immersion12. Scaffold typical designs13. Community tools

top related