the history of datalog

of 23 /23
1 The History of Datalog Origins Failure Resurrection

Upload: mikkel

Post on 10-Feb-2016




0 download

Embed Size (px)


The History of Datalog. Origins Failure Resurrection. An Odd Encounter. Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford. “I hear you were involved in the early work on Datalog.” - PowerPoint PPT Presentation


Page 1: The History of Datalog


The History of Datalog



Page 2: The History of Datalog


An Odd Encounter Several years ago, I met a

colleague, Monica Lam, in the hallway at Stanford.

“I hear you were involved in the early work on Datalog.”

She had discovered this work and used it in her system for large-scale data-flow analysis.

Page 3: The History of Datalog


Odd Encounter – (2) The application is naturally recursive. Very large-scale (analyzed code of

800K lines). They (Monica and her student John

Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

Page 4: The History of Datalog


Where Did Datalog Come From?

1. Codd’s tuple and domain calculus (1972).

2. Gallaire and Minker’s “Logic and Databases” (1978).

3. Prolog (1976).

Page 5: The History of Datalog


Codd’s Logics TRC. { t | R(r) and S(s) and t.A =

r.A and r.B = s.B and t.C = s.C } Implemented by Stonebraker as

QUEL. DRC. { ac | R(ab) and S(bc) }

Implemented by Zloof as Query-by-Example.

Page 6: The History of Datalog


“Logic and Databases” Viewed queries as the result of an

entire logical theory. Thus allows recursion, negation,

theories with multiple minimal models.

Closed/open-world evaluations.

Page 7: The History of Datalog


Prolog A conventional programming language

with predicates as function calls. Bizarre execution rule. Example: you have to write TC as:path(X,Y) :- arc(X,Y).path(X,Y) :- arc(X,Z),path(Z,Y).

Page 8: The History of Datalog


Implementation of Logical Query Languages for

Databases In 1984 I took sabbatical at Hebrew

University and wrote a paper with the above title.

It has some crazy stuff that makes me wonder “what was I thinking?”

Much was fixed by others, later. Published in SIGMOD (no real


Page 9: The History of Datalog


Implementation – (2) Key idea: Prolog notation + Horn-

clause, unique fixedpoint semantics.

Key idea: It’s about algorithms for query execution, not logical models. Original thought in that direction was

really by Henschen and Naqvi.

Page 10: The History of Datalog


Enter “Datalog” The term “Datalog” to refer to

positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.

Appears in their book Programming with Logic (1988), but in common use before that.

Page 11: The History of Datalog


Good Implementation Ideas

1. Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).

2. Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).

3. Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

Page 12: The History of Datalog


Magic Sets A query-rewriting scheme. Similar in effect to a number of

query-execution ideas such as1. Query-Subquery (Rohmer,

Lescoeur, and Kerasit, 1986).2. Memoing (Dietrich and Warren,


Page 13: The History of Datalog


Negation With negated subgoals in Datalog

Example: bachelor(X) :- male(X),NOT married(X,Y)

you run the risk of multiple minimal models.

Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).

Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

Page 14: The History of Datalog


The Death of Datalog Recursion turned out not to be all

that important in the world of the 1980’s.

In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

Page 15: The History of Datalog


The Rebirth Datalog slept, but nothing could

take away its important virtues: Simplicity and declarativeness. Tractability. Simple execution engine.

While “rule-based systems” were long an AI staple, they never got these features of Datalog.

Page 16: The History of Datalog


bddbddb Why did Monica Lam think of

Datalog for data-flow analysis? Classical DFA was for code

optimization. Only inner loops are important, so

data never needed to get really large.

Page 17: The History of Datalog


bddbddb – (2) Monica was looking at a different

application: software security. Example: can a string read at one point

be passed to a SQL call without first being the argument of a function that checks safety?

Entire program analyzed as a whole. Example: 800K lines of Apache. Now it’s a database problem.

Page 18: The History of Datalog


Overlog and Dedalus At about the same time, Joe

Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.

General direction: protocols for distributed systems.

Page 19: The History of Datalog


Overlog and Dedalus – (2) Two important additions: time and

space as first-class concepts. Example (space): Assume each

node has a table of arcs out. arc(@n, h) means the table at node n

contains an arc to node h.

Page 20: The History of Datalog


Example – Continued Each node n computes the set of

nodes it can reach by consulting the reach sets for the nodes to which n has arcs.

reach(@n, m) :- arc(@n, h),reach(@h, m).

Page 21: The History of Datalog


Some Other Datalog Directions1. Webdamlog (Abiteboul et al.,

these proceedings). Adds creation of rules at remote sites.

2. PrPl (Lam et al.). Social networking in Datalog.

3. SecPAL (Becker et al.). Microsoft authorization language

translated to Datalog.

Page 22: The History of Datalog


Other Directions – (2)4. LogicBlox (Molham Aref, CEO).

Startup in Atlanta GA. One of several Datalog-based startups.

Uses Datalog for customized decision-support systems.

Many extensions, including controlled 2nd –order predicates.

Still has a tractable, straightforward execution model.

Page 23: The History of Datalog


Conclusions Too early to tell how important

Datalog will be. Will simplicity and tractability beat

expressiveness? But moving in the right direction(s)

now. From Datalog 2.0 Workshop: needs

an open-source standard, like mySQL.