cs95 deductive databases
TRANSCRIPT
Deductive Databases
CS 95 Advanced Database Systems
Handout 6
Deductive Databases An area that is the intersection of databases, logic, and artificial
intelligence or knowledge bases; A deductive database system is a database system that includes
capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database
Part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred to as logic databases.
May be also referred to as intelligent databases, expert database systems or knowledge-based systems.
This systems also incorporate reasoning and inferencing capabilities using techniques that were developed in the field of artificial intelligence.
Knowledge-based Systems vs Deductive Database Systems
Knowledge-based expert systems have traditionally assumed that the data needed resides in main memory; hence secondary storage management is not an issue.
Deductive database systems attempt to change this restriction so that either a DBMS is enhanced to handle an expert system interface or an expert system is enhanced to handle secondary storage resident data.
The knowledge in an expert or knowledge-based system is extracted from application experts and refers to an application domain rather than to knowledge inherent in the data.
Deductive Databases Terminology A deductive database system is a database system that includes
capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database
Rules are specified using a declarative language - a language in which we specify what to achieve rather than how to achieve it.
An inference engine (or deduction mechanism) within the system can deduce new facts from the database by interpreting these rules.
Model used for deductive databases is closely related to the relational model, and particularly to the domain relational calculus formalism.
Deductive Databases Terminology (cont’d)
Deductive databases is also related to the field of logic programming and the Prolog language.
Deductive database work based on logic has used Prolog as a starting point.
Datalog - a variation of Prolog which is used to define rules declaratively in conjunction with an existing set of relations, which are themselves treated as literals in the language.
Although the language structure of Datalog resembles that of Prolog, its operational semantics - that is, how a Datalog program is to be executed - is still a topic of active research.
Deductive Databases: Facts and Rules A deductive database uses two main types of specifications: facts and rules. Facts are specified in a manner similar to the ways relations are specified, except
that it is not necessary to include attribute names. Recall that a tuple in a relation describes some real-world fact whose meaning is partly
determined by the attribute name. In a deductive database, the meaning of an attribute value in a tuple is determined solely
by its position within the tuple.
Rules are somewhat similar to relational views. They specify virtual relations that are not actually stored but can be formed from the facts
by applying inference mechanisms based on the rule specifications. The main difference between rules and views is that rules may involve recursion and
hence may yield virtual relations that cannot be defined in terms of standard relational views.
Deductive Databases: Evaluation of Prolog Programs
The evaluation of Prolog programs is based on a technique called backward chaining which involves a top-down evaluation of goals.
A goal in Prolog is equivalent to a query in a relational database system. In a deductive database that use Datalog attention has been devoted to handling
large volumes of data stored in a relational database. Hence, evaluation techniques have been devised that resemble that of bottom-up evaluation (forward chaining).
Prolog suffers from the limitation that the order of specifications of facts and rules is significant in evaluation; moreover, the order of literals within a rule is significant.
The execution techniques for Datalog programs attempt to circumvent these problems.
Prolog Programming System Prolog is a logic programming system that is based on a resolution
theorem prover. The system consists of two main components: the Prolog database and the inference engine. The Prolog database contains the sequence of Horn clauses that defines the logic program. The Prolog inference engine provides the control mechanism for proof construction using a theorem proving algorithm based on unification and backtracking. Prolog is not a pure logic programming language but rather a practical and partial implementation of logic programming. Apart from Horn clause logic, Prolog also incorporates evaluable predicates that have only a procedural interpretation and second-order predicate logic features which allow the representation and manipulation of lists.
Components of the Prolog System
Prolog System
P ro lo g D atabase(P D B)
P ro lo g In feren ce En gin e(P IE)
Queries Results
USERS
Prolog Programming System (cont’d) The data objects of Prolog, called terms, can be either a constant, a
variable, a structure or a list. Prolog is a function free language; functional expressions are not valid terms but structures are allowed, which can be used to the same effect as functional expressions. Each type of term is briefly described below:
╖ Constants include integers (e.g. 0, 1, 10), reals (e.g. 1.45, 10.04), strings (e.g. "Hello") and atoms (e.g. like, john, 'New York') which normally begin with a lower case letter or enclosed in single quotations. Some special combinations are also considered atoms (e.g. , -, ). The special underline character '_' may be inserted in the middle of an atom to improve its legibility.
╖ Variables are similar to atoms except that they begin with a capital letter or an underline character '_' (e.g. X, Name, _address). The underline character '_' also denotes an anonymous variable whose instances are always unique within the Prolog system.
Prolog Programming System (cont’d) ╖ Structures are more complex data objects. A structure comprises a functor and a
sequence of one or more terms called arguments. A functor is characterized by its name, which is an atom, and its arity or number of arguments. In contrast to functional expressions, structures are not evaluated when used as arguments. However, the use of structures as arguments allows meta-programming in Prolog since a structure both can be manipulated as a datum when used as an argument and evaluated as a procedure when taken independently as a predicate. For example, the structure point3 with arguments X, Y and Z, which is written as pointX, Y, Z, can be used as an argument to line2 as follows: linepointX1, Y1, Z1, pointX2, Y2, Z2.
╖ Lists are concatenations of Prolog terms that has the form .a, .b, .c, [] or simply a, b, c.
Similar to logic programs, a clause in Prolog can be either a fact, a rule or a query. In Prolog, the ':-' is used instead of '' as the implication symbol and the naming convention for atoms and variables is the reverse of that of the standard logic program notation, that is, atoms start with a lower-case character and variables start with an upper-case character. Prolog has a declarative and procedural semantics which is basically similar to that of logic programs.
Prolog Comparison Predicates
Prolog Representation of Entity-Relationship Database Schemes
Prolog Representation of Entity-Relationship Database Relations
Prolog Representation of Entity-Relationship Database Relations
Prolog Evaluation Strategy As mentioned above, the Prolog inference engine (PIE) is
based on a resolution theorem-prover that is based on unification and backtracking. Briefly, resolution is an inference pattern that permits the taking of arbitrarily large inference steps which require very considerable computational effort to carry out (Robinson, 1992); unification is the process of matching a subgoal with the head of a clause; and backtracking is a non-deterministic process of reviewing the goals which have been satisfied and attempting to resatisfy these goals by finding alternative solutions (Cohen, 1992). The Prolog goal evaluation strategy is by default top-down and proceeds from left-to-right (see also Figure 4.2).
Prolog Evaluation Strategy The Prolog inference engine (PIE) is based on a resolution
theorem-prover that is based on unification and backtracking. Briefly, resolution is an inference pattern that permits the taking of arbitrarily large inference steps which require very considerable computational effort to carry out; unification is the process of matching a subgoal with the head of a clause; and backtracking is a non-deterministic process of reviewing the goals which have been satisfied and attempting to resatisfy these goals by finding alternative solutions. The Prolog goal evaluation strategy is by default top-down and proceeds from left-to-right.
Prolog Evaluation Strategy(a) p(a,b). q(b,d). p(a,c). q(c,f). r(A,B,C) :- p(A,B), q(B,C).(b) :- r(X,Y,Z).(c) (1) 0 CALL: r(X,Y,Z)? (2) 1 CALL: p(X,Y)? (2) 1 EXIT: p(a,b) (3) 1 CALL: q(b,Z)? (3) 1 EXIT: q(b,d) (1) 0 EXIT: r(a,b,d)
(1) 0 REDO: r(a,b,d)? (3) 1 REDO: q(b,d)? (3) 1 FAIL: q(b,Z) (2) 1 REDO: p(a,b)? (2) 1 EXIT: p(a,c) (4) 1 CALL: q(c,Z)? (4) 1 EXIT: q(c,f) (1) 0 EXIT: r(a,c,f) (1) 0 REDO: r(a,c,f)? (4) 1 REDO: q(c,f)? (4) 1 FAIL: q(c,Z) (2) 1 REDO: p(a,c)? (2) 1 FAIL: p(X,Y)
(1) 0 FAIL: r(X,Y,Z)
Prolog Evaluation: (a) Database, (b) Query, (c) Evaluation Trace
Prolog/Datalog Notation
Notation is based on providing predicates with unique names. A predicate has an implicit meaning, which is suggested by the predicate
name, and a fixed number of arguments. If an argument are all constant values, the predicate simply states that a
certain fact is true. If the predicate has variables for arguments, it is either considered as a
query or as part of a rule or constraint. Prolog convention - all constant values in a predicate are either numeric or
character strings; they are represented as identifiers (or names) starting with lowercase letters only, whereas variable names always start with an uppercase letter.
Prolog/Datalog: Example(a) Prolog notation Facts
supervise(franklin,john).supervise(franklin,namesh).supervise(franklin,joyce).supervise(jennifer,alicia).supervise(jennifer,ahmad).supervise(james, franklin).supervise(james, jennifer).
Rulessuperior(X,Y) :- supervise(X,Y).superior(X,Y) :- supervise(X,Z), superior(Z,Y).subordinate(X,Y) :- superior(Y,X).
Queriessuperior(james,Y)?superior(james,joyce)?
james
franklin jennifer
john ramesh joyce alicia ahmad
(b) The supervisory tree
Deductive Databases Summary
stores knowledge with the DB
different methods of storing knowledge provide the terms:
KBMS or Expert Databases - use expert system IF..THEN..ELSE type rules
Deductive or Logic-Based databases often use Prolog-type rules
Expert databases generally incorporate knowledge extracted from experts in the field to provide reasoning and inferencing capabilities.
Logic-Based use axioms (logic theory) to store the data and deductive axioms (rules) to extend that information
eg: to store the fact that Anne is the parent of Betty use: parent (Anne, Betty); parent (Betty, Cameron);
Now a grandparent can be defined by the rule: grandparent (X, Z) = parent (X, Y), parent (Y, Z);
Many forms of deductive databases exist including Deductive Object-Oriented Databases
Applications include:
Enterprise modelling
Hypothesis testing
Electronic commerce