the history of datalog
Embed Size (px)
DESCRIPTION
The History of Datalog. Origins Failure Resurrection. An Odd Encounter. Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford. “I hear you were involved in the early work on Datalog.” - PowerPoint PPT PresentationTRANSCRIPT

1
The History of Datalog
OriginsFailure
Resurrection

2
An Odd Encounter Several years ago, I met a
colleague, Monica Lam, in the hallway at Stanford.
“I hear you were involved in the early work on Datalog.”
She had discovered this work and used it in her system for large-scale data-flow analysis.

3
Odd Encounter – (2) The application is naturally recursive. Very large-scale (analyzed code of
800K lines). They (Monica and her student John
Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

4
Where Did Datalog Come From?
1. Codd’s tuple and domain calculus (1972).
2. Gallaire and Minker’s “Logic and Databases” (1978).
3. Prolog (1976).

5
Codd’s Logics TRC. { t | R(r) and S(s) and t.A =
r.A and r.B = s.B and t.C = s.C } Implemented by Stonebraker as
QUEL. DRC. { ac | R(ab) and S(bc) }
Implemented by Zloof as Query-by-Example.

6
“Logic and Databases” Viewed queries as the result of an
entire logical theory. Thus allows recursion, negation,
theories with multiple minimal models.
Closed/open-world evaluations.

7
Prolog A conventional programming language
with predicates as function calls. Bizarre execution rule. Example: you have to write TC as:path(X,Y) :- arc(X,Y).path(X,Y) :- arc(X,Z),path(Z,Y).

8
Implementation of Logical Query Languages for
Databases In 1984 I took sabbatical at Hebrew
University and wrote a paper with the above title.
It has some crazy stuff that makes me wonder “what was I thinking?”
Much was fixed by others, later. Published in SIGMOD (no real
theorems!).

9
Implementation – (2) Key idea: Prolog notation + Horn-
clause, unique fixedpoint semantics.
Key idea: It’s about algorithms for query execution, not logical models. Original thought in that direction was
really by Henschen and Naqvi.

10
Enter “Datalog” The term “Datalog” to refer to
positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.
Appears in their book Programming with Logic (1988), but in common use before that.

11
Good Implementation Ideas
1. Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).
2. Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).
3. Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

12
Magic Sets A query-rewriting scheme. Similar in effect to a number of
query-execution ideas such as1. Query-Subquery (Rohmer,
Lescoeur, and Kerasit, 1986).2. Memoing (Dietrich and Warren,
1985).

13
Negation With negated subgoals in Datalog
Example: bachelor(X) :- male(X),NOT married(X,Y)
you run the risk of multiple minimal models.
Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).
Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

14
The Death of Datalog Recursion turned out not to be all
that important in the world of the 1980’s.
In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

15
The Rebirth Datalog slept, but nothing could
take away its important virtues: Simplicity and declarativeness. Tractability. Simple execution engine.
While “rule-based systems” were long an AI staple, they never got these features of Datalog.

16
bddbddb Why did Monica Lam think of
Datalog for data-flow analysis? Classical DFA was for code
optimization. Only inner loops are important, so
data never needed to get really large.

17
bddbddb – (2) Monica was looking at a different
application: software security. Example: can a string read at one point
be passed to a SQL call without first being the argument of a function that checks safety?
Entire program analyzed as a whole. Example: 800K lines of Apache. Now it’s a database problem.

18
Overlog and Dedalus At about the same time, Joe
Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.
General direction: protocols for distributed systems.

19
Overlog and Dedalus – (2) Two important additions: time and
space as first-class concepts. Example (space): Assume each
node has a table of arcs out. arc(@n, h) means the table at node n
contains an arc to node h.

20
Example – Continued Each node n computes the set of
nodes it can reach by consulting the reach sets for the nodes to which n has arcs.
reach(@n, m) :- arc(@n, h),reach(@h, m).

21
Some Other Datalog Directions1. Webdamlog (Abiteboul et al.,
these proceedings). Adds creation of rules at remote sites.
2. PrPl (Lam et al.). Social networking in Datalog.
3. SecPAL (Becker et al.). Microsoft authorization language
translated to Datalog.

22
Other Directions – (2)4. LogicBlox (Molham Aref, CEO).
Startup in Atlanta GA. One of several Datalog-based startups.
Uses Datalog for customized decision-support systems.
Many extensions, including controlled 2nd –order predicates.
Still has a tractable, straightforward execution model.

23
Conclusions Too early to tell how important
Datalog will be. Will simplicity and tractability beat
expressiveness? But moving in the right direction(s)
now. From Datalog 2.0 Workshop: needs
an open-source standard, like mySQL.