pads/ml: a functional data description language
DESCRIPTION
Pads/ML: A Functional Data Description Language. David Walker Princeton University with: Yitzhak Mandelbaum (Princeton), Kathleen Fisher and Mary Fernandez (AT&T). Data, data everywhere!. Incredible amounts of data stored in well-behaved formats:. Databases:. Tools Schema Browsers - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/1.jpg)
Pads/ML:Pads/ML:A Functional Data Description LanguageA Functional Data Description Language
David Walker
Princeton University
with: Yitzhak Mandelbaum (Princeton),
Kathleen Fisher and Mary Fernandez (AT&T)
![Page 2: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/2.jpg)
2
Data, data everywhere!Data, data everywhere!
Incredible amounts of data stored in well-behaved formats:
Tools• Schema• Browsers• Query languages• Standards• Libraries• Books, documentation• Conversion tools• Vendor support• Consultants…
Databases:
XML:
![Page 3: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/3.jpg)
3
Ad hoc DataAd hoc Data
• Vast amounts of data in ad hoc formats.• Ad hoc data is semi-structured:
– Not free text.– Not as rigid as data in relational databases.
• Examples from many different areas:– Physics– Computer system maintenance and administration– Biology– Finance– Government– Healthcare– More!
![Page 4: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/4.jpg)
4
Ad Hoc Data in BiologyAd Hoc Data in Biology
format-version: 1.0date: 11:11:2005 14:24auto-generated-by: DAG-Edit 1.419 rev 3default-namespace: gene_ontologysubsetdef: goslim_goa "GOA and proteome slim"
[Term]id: GO:0000001name: mitochondrion inheritancenamespace: biological_processdef: "The distribution of mitochondria\, including the mitochondrial genome\, into daughter cells after mitosis or meiosis\, mediated by interactions between mitochondria and the cytoskeleton." [PMID:10873824,PMID:11389764, SGD:mcc]is_a: GO:0048308 ! organelle inheritanceis_a: GO:0048311 ! mitochondrion distribution
www.geneontology.orgwww.geneontology.org
![Page 5: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/5.jpg)
5
Ad Hoc Data in Chemistry Ad Hoc Data in Chemistry
O=C([C@@H]2OC(C)=O)[C@@]3(C)[C@]([C@](CO4)(OC(C)=O)[C@H]4C[C@@H]3O)([H])[C@H](OC(C7=CC=CC=C7)=O)[C@@]1(O)[C@@](C)(C)C2=C(C)[C@@H](OC([C@H](O)[C@@H](NC(C6=CC=CC=C6)=O)C5=CC=CC=C5)=O)C1
O O
O
OH
AcO
H
O
O
O
HO
NH
O
O
OHO
![Page 6: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/6.jpg)
6
Ad Hoc Data in FinanceAd Hoc Data in Finance
HA00000000START OF TEST CYCLEaA00000001BXYZ U1AB0000040000100B0000004200HL00000002START OF OPEN INTERESTd 00000003FZYX G1AB0000030000300000HM00000004END OF OPEN INTERESTHE00000005START OF SUMMARYf 00000006NYZX B1QB00052000120000070000B000050000000520000 00490000005100+00000100B00000005300000052500000535000HF00000007END OF SUMMARY
www.opradata.comwww.opradata.com
![Page 7: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/7.jpg)
7
Ad Hoc Data from Web Server Logs Ad Hoc Data from Web Server Logs (CLF)(CLF)
207.136.97.49 - - [15/Oct/1997:18:46:51 -0700] "GET /tk/p.txt HTTP/1.0" 200 30tj62.aol.com - - [16/Oct/1997:14:32:22 -0700] "POST /scpt/[email protected]/confirm HTTP/1.0" 200 941234.200.68.71 - - [15/Oct/1997:18:53:33 -0700] "GET /tr/img/gift.gif HTTP/1.0” 200 409240.142.174.15 - - [15/Oct/1997:18:39:25 -0700] "GET /tr/img/wool.gif HTTP/1.0" 404 178188.168.121.58 - - [16/Oct/1997:12:59:35 -0700] "GET / HTTP/1.0" 200 3082214.201.210.19 ekf - [17/Oct/1997:10:08:23 -0700] "GET /img/new.gif HTTP/1.0" 304 -
![Page 8: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/8.jpg)
8
Ad Hoc Data: DNS packetsAd Hoc Data: DNS packets
00000000: 9192 d8fb 8480 0001 05d8 0000 0000 0872 ...............r00000010: 6573 6561 7263 6803 6174 7403 636f 6d00 esearch.att.com.00000020: 00fc 0001 c00c 0006 0001 0000 0e10 0027 ...............'00000030: 036e 7331 c00c 0a68 6f73 746d 6173 7465 .ns1...hostmaste00000040: 72c0 0c77 64e5 4900 000e 1000 0003 8400 r..wd.I.........00000050: 36ee 8000 000e 10c0 0c00 0f00 0100 000e 6...............00000060: 1000 0a00 0a05 6c69 6e75 78c0 0cc0 0c00 ......linux.....00000070: 0f00 0100 000e 1000 0c00 0a07 6d61 696c ............mail00000080: 6d61 6ec0 0cc0 0c00 0100 0100 000e 1000 man.............00000090: 0487 cf1a 16c0 0c00 0200 0100 000e 1000 ................000000a0: 0603 6e73 30c0 0cc0 0c00 0200 0100 000e ..ns0...........000000b0: 1000 02c0 2e03 5f67 63c0 0c00 2100 0100 ......_gc...!...000000c0: 0002 5800 1d00 0000 640c c404 7068 7973 ..X.....d...phys000000d0: 0872 6573 6561 7263 6803 6174 7403 636f .research.att.co
![Page 9: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/9.jpg)
9
Properties of Ad hoc DataProperties of Ad hoc Data
• Data arrives “as is” -- you don’t choose the format
• Documentation is often out-of-date or nonexistent.– Hijacked fields.
– Undocumented “missing value” representations.
• Data is buggy.– Missing data, “extra” data, …
– Human error, malfunctioning machines, software bugs (e.g. race conditions on log entries), …
– Errors are sometimes the most interesting portion of the data.
• Data sources often have high volume.– Data might not fit into main memory.
• Data can be created by malicious sources attempting to exploit software vulnerabilities
– c.f. Ethereal network monitoring system
![Page 10: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/10.jpg)
10
The Goal(s)The Goal(s)• What can we do about ad hoc data?
– how do we read it into programs?– how do we detect errors?– how do we correct errors?– how do we query it?– how do we discover its structure and properties? – how do we view it?– how do we transform it into standard formats like CSV, XML?– how do we merge multiple data sources?
• In short: how do we do all the things we take for granted when dealing with standard formats in a fault-tolerant and efficient, yet nearly effortless way?
![Page 11: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/11.jpg)
11
Enter PadsEnter Pads
• Pads: a system for Processing Ad hoc Data Sources
• Three main components:– a data description language
• for concise and precise specifications of ad hoc data formats and semantic properties
– a compiler that automatically generates a suite of programming libraries & end-to-end applications
– a visual interface to support both novice and expert users
![Page 12: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/12.jpg)
12
One Description, Many ToolsOne Description, Many Tools
Data Description(Type T)
compiler
queryengine
parser printervisual data
browserxml
translator...
programming library
complete application
![Page 13: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/13.jpg)
13
Some Advantages Over Some Advantages Over Ad Hoc MethodsAd Hoc Methods
• Big bang for buck: 1 description, many tools
• Descriptions document data sources
– the documentation IS the tool generator so documentation is automatically
kept up-to-date with implementation
• Descriptions are easy to write, easy to understand.
– descriptions are high-level & declarative
– description syntax exploits programmer intuition concerning types
• Tools are robust
– Error handling code generated automatically; doesn’t clutter documentation.
• Descriptions & generated tools can be analyzed and reasoned about
– eg: data size, tool termination & safety properties, coherence of generated
parsers & printers
![Page 14: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/14.jpg)
14
The PADS ProjectThe PADS Project
• PADS/C [PLDI 05; POPL 06]– Based on C type structure.– Generates C libraries.
• too bad C doesn’t actually support libraries ....
– LaunchPADS visual interface [Daly et al., SIGMOD 06]
• PADS/ML (Mandelbaum’s thesis)– Based on the ML type structure.
• polymorphic, dependent datatypes– Generates ML modules.
• better reuse & library structure • functional data processing = far
greater programmer productivity– New framework for tool development.
• Format-independent algorithms architected using functors vs macros
– Implementation status.• Version 1.0 up and running• Many more exciting things to do
• Describe real formats:– Newick tree-structured data– Reglens galaxy catalogues– Palm PDA databases– AT&T call-detail data– AT&T billing data– Web server logs– Gene ontologies– DNS packets– OPRA data– More …
![Page 15: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/15.jpg)
15
OutlineOutline
• Motivation and PADS Overview
• Data Description in PADS/ML
• Implementation architecture
• The Semantic of PADS
• Conclusions
![Page 16: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/16.jpg)
16
Base Types and RecordsBase Types and Records
• Base types: C (e).– Describe atomic portions of data.
– Parameterized by host-language expression.
– Examples:
• Pint, Pchar, Pstring_FW(n), Pstring(c).
• Tuples and Records: t * t’ and {x:t; y:t’}.– Record fields are dependent: field names can be referenced
by types of later fields.
– Example to follow.
![Page 17: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/17.jpg)
17
122Joe|Wright|45|95|79n/aEd|Wood|10|47|31124Chris|Nolan|80|93|85 Burton|30|82|71126George|Lucas|32|62|40
Base Types and RecordsBase Types and Records
Tim
Pint * Pstring(‘|’) * Pchar
125 |
Movie-director Bowling Score (MBS) Format
![Page 18: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/18.jpg)
18
13C Programming31Types and Programming Languages20Twenty Years of PLDI36Modern Compiler Implementation in ML 27Elements o f ML Programming
Base Types and RecordsBase Types and Records
13C Programming
{ width: ; title: Pstring_FW(width) }Pint
Bookshelf Listing (BL) Format
![Page 19: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/19.jpg)
19
ConstraintsConstraints
• Constrained types: [x:t | e] .– Enforce the constraint e on the underlying type t.
BurtonTim125 | 30|
[c:Pchar | c = ‘|’]
ptype Scores = { min:Pint; ‘|’; max: [m:Pint | min ≤ m]; ‘|’; avg: [a:Pint | min ≤ a & a ≤ max] }
82 71| |
Pchar ‘|’
![Page 20: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/20.jpg)
20
122Joe|Wright|45|95|79n/aEd|Wood|10|47|31124Chris|Nolan|80|93|85125Tim|Burton|30|82|71126George|Lucas|32|62|40
DatatypesDatatypes
• Describe alternatives in data source with datatypes.– Parser tries each alternative in order.
pdatatype Id = None of “n/a” | Some of Pint
n/aEd|Wood|10|47|31124Chris|Nolan|80|93|85
![Page 21: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/21.jpg)
21
Recursive Datatypes Recursive Datatypes
• Describe inductively-defined formats.
pdatatype IntList = Cons of Pint * ‘|’
| Last of Pint
* IntList
79| 4031|71|
![Page 22: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/22.jpg)
22
Polymorphic Types Polymorphic Types
• Parameterize types by other types.
pdatatype (Elt) List = Cons of Elt * ‘|’
| Last of Elt
* (Elt) List
pdatatype IntList = Cons of Pint * ‘|’
| Last of Pint
* IntList
ptype IntList = Pint List
ptype CharList = Pchar List
![Page 23: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/23.jpg)
23
Dependent Types Dependent Types
• Parameterize types by values.
pdatatype IntList = Cons of Pint * ‘|’ * IntList
| Nil of Pint
ptype IntListBar = Pint List(‘|’)
ptype CharListComma = Pchar List (‘,’)
pdatatype (Elt) List (x:char) = Cons of Elt * x * (Elt) List(x)
| Nil of Elt
![Page 24: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/24.jpg)
24
More Dependent Types More Dependent Types
pdatatype GuidedOption (tag: int) =
pmatch tag of
0 => Zero of Pstring
| 1 => One of Pint
| 2 => Two of Pint * Pint
| _ => None
ptype source = {tag: Pint; payload: GuidedOption (tag)}
• “Switched” datatypes:
![Page 25: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/25.jpg)
ptype Timestamp = Ptimestamp_explicit_FW(8, "%H:%M:%S", gmt)
ptype Pip = Puint8 * ’.’ * Puint8 * ’.’ * Puint8 * ’.’ * Puint8
ptype (Alpha) Pnvp(p : string -> bool) = { name : [name : Pstring(’=’) | p name]; ’=’; value : Alpha }
ptype (Alpha) Nvp(name:string) = Alpha Pnvp(fun s -> s = name)
ptype SVString = Pstring_SE("/;|\\|/")
ptype Nvp_a = SVString Pnvp(fun _ -> true)
ptype Details = { source : Pip Nvp("src_addr");’;’; dest : Pip Nvp("dest_addr");’;’; start_time : Timestamp Nvp("start_time");’;’; end_time : Timestamp Nvp("end_time");’;’; cycle_time : Puint32 Nvp("cycle_time")}
ptype Semicolon = Pcharlit(’;’)ptype Vbar = Pcharlit(’|’)
pdatatype Info(alarm_code : int) = Pmatch alarm_code with 5074 -> Details of Details | _ -> Generic of (Nvp_a,Semicolon,Vbar) Plist
pdatatype Service = Dom of "DOMESTIC" | Int of "INTERNATIONAL" | Spec of "SPECIAL"
ptype Raw_alarm = { alarm : [ i : Puint32 | i = 2 or i = 3];’:’; start : Timestamp Popt;’|’; clear : Timestamp Popt;’|’; code : Puint32;’|’; src_dns : SVString Nvp("dns1");’;’; dest_dns : SVString Nvp("dns2");’|’; info : Info(code);’|’; service : Service}
let checkCorr ra = ...
ptype Alarm = [x:Raw_alarm | checkCorr x]ptype Source = (Alarm,Peor,Peof) Plist
2:3004092508||5001|dns1=abc.com;dns2=xyz.com|c=slow link;w=lost packets|INTERNATIONAL3:|3004097201|5074|dns1=bob.com;dns2=alice.com|src_addr=192.168.0.10;dst_addr=192.168.23.10;start_time=1234567890;end_time=1234568000;cycle_time=17412|SPECIAL
Sample Regulus Data:
PADS/ML Regulus Format:
![Page 26: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/26.jpg)
26
OutlineOutline
• Motivation and PADS Overview
• Data Description in PADS/ML
• Implementation architecture
• The Semantic of PADS
• Conclusions
![Page 27: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/27.jpg)
27
Parsing With PADSParsing With PADS
data description(type T)
0100100100111 user code
data rep(type ~ T)
parse descriptor(type ~ T)
compiler
parser
![Page 28: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/28.jpg)
28
Example: MBS Example: MBS Representation Representation
n/aEd|Wood|10|47|31
ptype MBS-Entry = { id: Id; first: Pstring(‘|’); ‘|’; last: Pstring(‘|’); ‘|’; scores: Scores }
pdatatype Id = None of “n/a” | Some of Pint
type MBS-Entry = { id: Id; first: string; last: string; scores: Scores }
datatype Id = None | Some of int
![Page 29: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/29.jpg)
29
Tool Generation With PADS/MLTool Generation With PADS/ML
data description(type T)
0100100100111
format-specific traversalfunctor
data rep(type ~ T)
parse descriptor(type ~ T)
compiler
parser
format-independent
toolmodule
tools in this pattern:accumulator, debugger, histograms, clusters, format converters
![Page 30: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/30.jpg)
30
Types as ModulesTypes as Modules
• PADS/ML generates a module for each type/description
• Parameterized types ==> Functors• Recursive types ==> Recursive modules
– sigh: combination of recursive modules & functors not supported in O’Caml, so we’re reduced to a bit of a hack for recursion
sig
type rep
type pd
fun parser : Pads.handle -> rep * pd
module Traverse (tool : TOOL) : sig ... end
end
![Page 31: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/31.jpg)
31
OutlineOutline
• Motivation and PADS Overview
• Data Description in PADS/ML
• Implementation architecture
• The Semantic of PADS
• Conclusions
![Page 32: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/32.jpg)
32
MotivationMotivation
• To crystallize design principles.
– Example: error counting methodology in PADS/C.
• To ensure system correctness.
– Example: parsers return data of expected type.
• As basis for evolution and experimentation.
– Critical to design of PADS/ML.
• To communicate core ideas.
– Designing the next 700 data description languages.
![Page 33: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/33.jpg)
33
PADS and DDCPADS and DDC
• Developed semantic framework based on Data
Description Calculus (DDC).
• Explains PADS/ML and other languages with DDC.
• Give denotational semantics to DDC.
PADS/CPADS/CPADS/MLPADS/ML
The Next 700 The Next 700
DDC
![Page 34: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/34.jpg)
34
Data Description CalculusData Description Calculus
• DDC: calculus of dependent types for describing data.
• Expressions e with type drawn from F-omega• A kinding judgment specifies well-formed descriptions.
t ::= unit | bottom | C(e)
| x:t.t | t + t | t & t | {x:t | e}
| t seq(t,e,t) | x.e | t e
| .t | t t | | .t
| compute (e:) | absorb(t) |
scan(t)
![Page 35: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/35.jpg)
35
Choosing a SemanticsChoosing a Semantics
• Semantics of REs, CFGs given as sets of strings but
fails to account for:
– Relationship between internal and external data.
– Error handling.
– Types of representation and parse descriptor.
• DDC
– Denotational semantics of types as parsers in F-omega
![Page 36: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/36.jpg)
36
A 3-Fold SemanticsA 3-Fold Semantics
t
0100100100...
Parser
Representation
Parse Descriptor
Description
[ t ][ t ]rep
[ t ]pd
Interpretations of t[ {x:t | e} ]rep = [ t ]rep + [ t ]rep
[ x:t.t’ ]rep = [ t ]rep * [ t’ ]rep
[ {x:t | e} ]pd = hdr * [ t ]pd
[ x:t.t’ ]pd = hdr * [ t ]pd * [ t’ ]pd
![Page 37: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/37.jpg)
37
Type CorrectnessType Correctness
t
0100100100...
Parser
Representation
Parse Descriptor
DescriptionTheorem [ t ] : bits [ t ]rep * [ t ]pd [ t ]
[ t ]rep
[ t ]pd
Interpretations of t
![Page 38: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/38.jpg)
38
OutlineOutline
• Motivation and PADS Overview
• Data Description in PADS/ML
• Implementation architecture
• The Semantic of PADS
• Conclusions
![Page 39: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/39.jpg)
39
Related WorkRelated Work
• parser generator technology:– Lex & Yacc
• no dependency
• semantic actions entwined with data description
• no higher-level tools
– Parser combinators
• semantic actions entwined with data description
• no higher-level tools
![Page 40: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/40.jpg)
40
Reminder:Reminder:One Description, Many ToolsOne Description, Many Tools
Data Description(Type T)
compiler
queryengine
parser printervisual data
browserxml
translator...
programming library
complete application
![Page 41: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/41.jpg)
41
Parser combinators:Parser combinators:One algorithm, One ToolOne algorithm, One Tool
parser
![Page 42: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/42.jpg)
42
Related WorkRelated Work
• Other “data description” languages– Data Format Description Language (DFDL)
– Binary Format Description Language (BFD)
– PacketTypes [SIGCOMM ’00]
– DataScript [GPCE ’02]
• None have a well-defined semantics or Pads tool
support
![Page 43: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/43.jpg)
43
Current & Future WorkCurrent & Future Work
• Tools and Applications– Description inference.
– Support for specific domains (microbiology)
• Language Design– Transformation language for ad hoc data.
– Description language for distributed
• Describe locations, versions, timing, relationships, etc.
• Theory– Analyze data descriptions for interesting properties, e.g.
equivalence, data size, termination, emptiness (always fails).
– Coherence of parsing & printing
![Page 44: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/44.jpg)
44
SummarySummary
• The PADS vision: reliable, efficient and effortless ad
hoc data processing
• PADS/ML:– Data description based on polymorphic, dependent
datatypes
– “Types as modules” implementation
– Solid theoretical basis.
• Visit www.padsproj.org
![Page 45: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/45.jpg)
45
The EndThe End
Questions?
![Page 46: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/46.jpg)
49
Existing ApproachesExisting Approaches
• C, Perl, or shell scripts: most popular.– Time consuming & error prone to hand code parsers.
– Difficult to maintain (worse than the ad hoc data itself in some cases!).
– Often incomplete, particularly with respect to errors.
• Error code, if written, swamps main-line computation.
• If not written, errors can corrupt “good” data.
• Lex & Yacc
– Good match for programming languages.
– Bad match for ad hoc data.
• Compiler converts descriptions into robust, format-specific
tools.
![Page 47: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/47.jpg)
50
Parsing With PADSParsing With PADS
• Robust parser at the core of generated tools.
![Page 48: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/48.jpg)
51
Using Ad hoc DataUsing Ad hoc Data
• Parsing only brings you part way.– Queries must be written in ML.– A lot of work.
• What about a declarative query?
122Joe|Wright|45|95|79
124Chris|Nolan|80|93|85125Tim|Burton|30|82|71126George|Lucas|32|62|40
• Can Ed Wood bowl?
n/aEd|Wood|10|47|31
![Page 49: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/49.jpg)
52
From Ad hoc Data To XMLFrom Ad hoc Data To XML
• XML– Encoding for semi-structured data.
– Good match!
• XQuery– Declarative XML query language for semi-structured
sources.
– Standardized by W3C, many implementations.
![Page 50: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/50.jpg)
53
PADX = PADS + XQueryPADX = PADS + XQuery
• Galax [Fernandez, et al.]
– Complete, open-source XQuery implementation.
• PADX
– Integrates PADS and Galax.
– Supports declarative queries over ad hoc data sources.
![Page 51: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/51.jpg)
54
Using PADXUsing PADX
• User describes format in PADS.
• PADX provides
– XML “view” of data in XML Schema.
– Customized XQuery engine.
• Query PADS-specific and other XML sources.
• User provides
– Ad hoc data
– Queries expressed in XQuery.
![Page 52: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/52.jpg)
55
Describing MBS Format Describing MBS Format
• Example Movie-director Bowling Score data
• PADS/ML Description
n/aEd|Wood|10|47|31
ptype MBS-Entry = { id: Id; first: Pstring(‘|’); ‘|’; last: Pstring(‘|’); ‘|’; scores: Scores }
![Page 53: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/53.jpg)
56
Viewing and Querying MBSViewing and Querying MBS• Virtual XML view
• Query: What is Ed Wood’s maximum score?
$pads/Psource/MBS-Entry[first = “Ed”][last = “Wood”]/scores/max
• Query: Which directors have scored less than 50?
$pads/Psource/MBS-Entry[scores/min < 50]
<MBS-Entry> <id><None>n/a</None></id> <first>Ed</first> <last>Wood</last> <scores> <min>10</min> <max>47</max> <avg>31</avg> <scores></MBS-Entry>
ptype MBS-Entry = { id: Id; first: Pstring(‘|’); ‘|’; last: Pstring(‘|’); ‘|’; scores: Scores }
n/aEd|Wood|10|47|31
![Page 54: Pads/ML: A Functional Data Description Language](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681441d550346895db0b966/html5/thumbnails/54.jpg)
57
Challenges & SolutionsChallenges & Solutions
• Semantics
– Map PADS language to XML Schema.
• Re-engineer Galax Data Model
– Create abstract data model.
– Generate description-specific concrete data models.
• Efficiently query large-scale data sources.
– Provide lazy access to data.
– Implement custom memory-management.