cs95 deductive databases

Deductive Databases

CS 95 Advanced Database Systems

Handout 6

Deductive Databases An area that is the intersection of databases, logic, and artificial

intelligence or knowledge bases; A deductive database system is a database system that includes

capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database

Part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred to as logic databases.

May be also referred to as intelligent databases, expert database systems or knowledge-based systems.

This systems also incorporate reasoning and inferencing capabilities using techniques that were developed in the field of artificial intelligence.

Knowledge-based Systems vs Deductive Database Systems

Knowledge-based expert systems have traditionally assumed that the data needed resides in main memory; hence secondary storage management is not an issue.

Deductive database systems attempt to change this restriction so that either a DBMS is enhanced to handle an expert system interface or an expert system is enhanced to handle secondary storage resident data.

The knowledge in an expert or knowledge-based system is extracted from application experts and refers to an application domain rather than to knowledge inherent in the data.

Deductive Databases Terminology A deductive database system is a database system that includes

capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database

Rules are specified using a declarative language - a language in which we specify what to achieve rather than how to achieve it.

An inference engine (or deduction mechanism) within the system can deduce new facts from the database by interpreting these rules.

Model used for deductive databases is closely related to the relational model, and particularly to the domain relational calculus formalism.

Deductive Databases Terminology (cont’d)

Deductive databases is also related to the field of logic programming and the Prolog language.

Deductive database work based on logic has used Prolog as a starting point.

Datalog - a variation of Prolog which is used to define rules declaratively in conjunction with an existing set of relations, which are themselves treated as literals in the language.

Although the language structure of Datalog resembles that of Prolog, its operational semantics - that is, how a Datalog program is to be executed - is still a topic of active research.

Deductive Databases: Facts and Rules A deductive database uses two main types of specifications: facts and rules. Facts are specified in a manner similar to the ways relations are specified, except

that it is not necessary to include attribute names. Recall that a tuple in a relation describes some real-world fact whose meaning is partly

determined by the attribute name. In a deductive database, the meaning of an attribute value in a tuple is determined solely

by its position within the tuple.

Rules are somewhat similar to relational views. They specify virtual relations that are not actually stored but can be formed from the facts

by applying inference mechanisms based on the rule specifications. The main difference between rules and views is that rules may involve recursion and

hence may yield virtual relations that cannot be defined in terms of standard relational views.

Deductive Databases: Evaluation of Prolog Programs

The evaluation of Prolog programs is based on a technique called backward chaining which involves a top-down evaluation of goals.

A goal in Prolog is equivalent to a query in a relational database system. In a deductive database that use Datalog attention has been devoted to handling

large volumes of data stored in a relational database. Hence, evaluation techniques have been devised that resemble that of bottom-up evaluation (forward chaining).

Prolog suffers from the limitation that the order of specifications of facts and rules is significant in evaluation; moreover, the order of literals within a rule is significant.

The execution techniques for Datalog programs attempt to circumvent these problems.

Prolog Programming System Prolog is a logic programming system that is based on a resolution

theorem prover. The system consists of two main components: the Prolog database and the inference engine. The Prolog database contains the sequence of Horn clauses that defines the logic program. The Prolog inference engine provides the control mechanism for proof construction using a theorem proving algorithm based on unification and backtracking. Prolog is not a pure logic programming language but rather a practical and partial implementation of logic programming. Apart from Horn clause logic, Prolog also incorporates evaluable predicates that have only a procedural interpretation and second-order predicate logic features which allow the representation and manipulation of lists.

Components of the Prolog System

Prolog System

P ro lo g D atabase(P D B)

P ro lo g In feren ce En gin e(P IE)

Queries Results

USERS

Prolog Programming System (cont’d) The data objects of Prolog, called terms, can be either a constant, a

variable, a structure or a list. Prolog is a function free language; functional expressions are not valid terms but structures are allowed, which can be used to the same effect as functional expressions. Each type of term is briefly described below:

╖ Constants include integers (e.g. 0, 1, 10), reals (e.g. 1.45, 10.04), strings (e.g. "Hello") and atoms (e.g. like, john, 'New York') which normally begin with a lower case letter or enclosed in single quotations. Some special combinations are also considered atoms (e.g. , -, ). The special underline character '_' may be inserted in the middle of an atom to improve its legibility.

╖ Variables are similar to atoms except that they begin with a capital letter or an underline character '_' (e.g. X, Name, _address). The underline character '_' also denotes an anonymous variable whose instances are always unique within the Prolog system.

Prolog Programming System (cont’d) ╖ Structures are more complex data objects. A structure comprises a functor and a

sequence of one or more terms called arguments. A functor is characterized by its name, which is an atom, and its arity or number of arguments. In contrast to functional expressions, structures are not evaluated when used as arguments. However, the use of structures as arguments allows meta-programming in Prolog since a structure both can be manipulated as a datum when used as an argument and evaluated as a procedure when taken independently as a predicate. For example, the structure point3 with arguments X, Y and Z, which is written as pointX, Y, Z, can be used as an argument to line2 as follows: linepointX1, Y1, Z1, pointX2, Y2, Z2.

╖ Lists are concatenations of Prolog terms that has the form .a, .b, .c, [] or simply a, b, c.

Similar to logic programs, a clause in Prolog can be either a fact, a rule or a query. In Prolog, the ':-' is used instead of '' as the implication symbol and the naming convention for atoms and variables is the reverse of that of the standard logic program notation, that is, atoms start with a lower-case character and variables start with an upper-case character. Prolog has a declarative and procedural semantics which is basically similar to that of logic programs.

Prolog Comparison Predicates

Prolog Representation of Entity-Relationship Database Schemes

Prolog Representation of Entity-Relationship Database Relations

Prolog Evaluation Strategy As mentioned above, the Prolog inference engine (PIE) is

based on a resolution theorem-prover that is based on unification and backtracking. Briefly, resolution is an inference pattern that permits the taking of arbitrarily large inference steps which require very considerable computational effort to carry out (Robinson, 1992); unification is the process of matching a subgoal with the head of a clause; and backtracking is a non-deterministic process of reviewing the goals which have been satisfied and attempting to resatisfy these goals by finding alternative solutions (Cohen, 1992). The Prolog goal evaluation strategy is by default top-down and proceeds from left-to-right (see also Figure 4.2).

Prolog Evaluation Strategy The Prolog inference engine (PIE) is based on a resolution

theorem-prover that is based on unification and backtracking. Briefly, resolution is an inference pattern that permits the taking of arbitrarily large inference steps which require very considerable computational effort to carry out; unification is the process of matching a subgoal with the head of a clause; and backtracking is a non-deterministic process of reviewing the goals which have been satisfied and attempting to resatisfy these goals by finding alternative solutions. The Prolog goal evaluation strategy is by default top-down and proceeds from left-to-right.

Prolog Evaluation Strategy(a) p(a,b). q(b,d). p(a,c). q(c,f). r(A,B,C) :- p(A,B), q(B,C).(b) :- r(X,Y,Z).(c) (1) 0 CALL: r(X,Y,Z)? (2) 1 CALL: p(X,Y)? (2) 1 EXIT: p(a,b) (3) 1 CALL: q(b,Z)? (3) 1 EXIT: q(b,d) (1) 0 EXIT: r(a,b,d)

(1) 0 REDO: r(a,b,d)? (3) 1 REDO: q(b,d)? (3) 1 FAIL: q(b,Z) (2) 1 REDO: p(a,b)? (2) 1 EXIT: p(a,c) (4) 1 CALL: q(c,Z)? (4) 1 EXIT: q(c,f) (1) 0 EXIT: r(a,c,f) (1) 0 REDO: r(a,c,f)? (4) 1 REDO: q(c,f)? (4) 1 FAIL: q(c,Z) (2) 1 REDO: p(a,c)? (2) 1 FAIL: p(X,Y)

(1) 0 FAIL: r(X,Y,Z)

Prolog Evaluation: (a) Database, (b) Query, (c) Evaluation Trace

Prolog/Datalog Notation

Notation is based on providing predicates with unique names. A predicate has an implicit meaning, which is suggested by the predicate

name, and a fixed number of arguments. If an argument are all constant values, the predicate simply states that a

certain fact is true. If the predicate has variables for arguments, it is either considered as a

query or as part of a rule or constraint. Prolog convention - all constant values in a predicate are either numeric or

character strings; they are represented as identifiers (or names) starting with lowercase letters only, whereas variable names always start with an uppercase letter.

Prolog/Datalog: Example(a) Prolog notation Facts

supervise(franklin,john).supervise(franklin,namesh).supervise(franklin,joyce).supervise(jennifer,alicia).supervise(jennifer,ahmad).supervise(james, franklin).supervise(james, jennifer).

Rulessuperior(X,Y) :- supervise(X,Y).superior(X,Y) :- supervise(X,Z), superior(Z,Y).subordinate(X,Y) :- superior(Y,X).

Queriessuperior(james,Y)?superior(james,joyce)?

james

franklin jennifer

john ramesh joyce alicia ahmad

(b) The supervisory tree

Deductive Databases Summary

stores knowledge with the DB

different methods of storing knowledge provide the terms:

KBMS or Expert Databases - use expert system IF..THEN..ELSE type rules

Deductive or Logic-Based databases often use Prolog-type rules

Expert databases generally incorporate knowledge extracted from experts in the field to provide reasoning and inferencing capabilities.

Logic-Based use axioms (logic theory) to store the data and deductive axioms (rules) to extend that information

eg: to store the fact that Anne is the parent of Betty use: parent (Anne, Betty); parent (Betty, Cameron);

Now a grandparent can be defined by the rule: grandparent (X, Z) = parent (X, Y), parent (Y, Z);

Many forms of deductive databases exist including Deductive Object-Oriented Databases

Applications include:

Enterprise modelling

Hypothesis testing

Electronic commerce

cs95 deductive databases

Documents

infer additional

deductive

prolog inference

prolog programming

deductive

prolog evaluation

inference

expert system