introduction current work design & implementation conclusions pqlite: provenance query language...

20
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Languag PQLite: An Overly Simplistic Query Language for Data Provenance [email protected] [email protected] CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering Michael {Leece, Sevilla}

Upload: letitia-dawson

Post on 18-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQLite:An Overly Simplistic Query Language for

Data Provenance

[email protected]@soe.ucsc.edu

CMPS203 Final ProjectUniversity of California, Santa CruzJack Baskin School of Engineering

Michael {Leece, Sevilla}

Page 2: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Overview• Introduction• Current Work• Design and Implementation• Conclusions

Page 3: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

TerminologyApplicationsTerminologyApplications

Introduction

• Provenance: history + ancestry of an object [1]– Processes– Data

• Provenance Aware Storage (PASS)– Transparent collection

• PQL: Path Query Language– Useful for provenance

Terminology

Terminology

Ancestry Graph

Page 4: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

TerminologyApplicationsTerminologyApplications

Introduction

• Security• File System Search• The Cloud• New Hierarchical File Systems• Yan Li’s Photo Album

Applications

Applications

Page 5: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

TerminologyApplicationsTerminologyApplications

Introduction

• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}

PQL Broken

PQL Broken

Page 6: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work

• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}

• “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.”

– PASS Team

PQL Broken

PQL Broken

Page 7: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work

PQL Undocumented

PQL Undocumented

Page 8: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work

Overview

Overview

Waldo Database

Dump

Waldo Database

Dump

PASSv2 ModulesPASSv2 Modules

Kernel SpaceKernel Space

VFSVFSLasagna FSLasagna FS

App1App1 App2App2

User SpaceUser Space

BDBBDB.twig.twig

Page 9: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

• What we have– [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME 1329510432.493134083[ P ] 1.0 FREEZETIME 8 TIME

1329510618.420311721[ P ] 1.0 FREEZETIME 8 TIME 1329510676.040716382[AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c364-1329508695.722050-291519720" [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME 1329510428.104272662[ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> 14762.0[ P ] 2.0 FREEZETIME 8 TIME 1329510428.104272662

• What we want– A list of files or processes that are one-step ancestors of

“/file.txt”

Use Case

Use Case

Page 10: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database

Dump

Waldo Database

Dump

Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser

Evaluator

Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree

Response:

[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 11: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database

Dump

Waldo Database

Dump

Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser

Evaluator

Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree

Response:

[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 12: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database

Dump

Waldo Database

Dump

Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser

Evaluator

Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree

Response:

[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 13: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Select Statement

Language Specification

Page 14: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Select Statement

Language Specification

Page 15: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Expression

Language Specification

Page 16: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Expression

Language Specification

Page 17: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database

Dump

Waldo Database

Dump

Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser

Evaluator

Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree

Response:

[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 18: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences

Conclusions

• Functional– It works. (PQLite > PQL)

• Easy to use– Intuitive (SQL-like) way of querying a provenance

graph– Getting stuff we care about

What we did well

What We Did Well

Page 19: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences

Conclusions

• Infinite recursion in parsing– Left recursion in a recursive descent parser– Refined syntax

• Began coding too soon• Monads are useful– IO(), Maybe, State, Parsec

Lessons Learned

Lessons Learned

Page 20: Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences

Conclusions

1) Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July 2005

2) Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June 2011.

3) Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009.

4) PQL Language Guide and Reference

References

References