The need for Temporal Databases
(1) Need for coping with temporal data (both VT and TT)
(2) Just adding 1 (or 2, or 4) temporal attributes (and maybe some ad-hoc procedures) does not work!
(3) First, a rigorous semantic framework is needed, to give formal specification to the implementation. Properties: clearness, expressiveness, upward compatibiliy. Ex. BCDM
(4) Second, the implementation must be proven to respect the semantics. Core issue here: efficient (1-NF)
implementations hardly grant uniqueness of representation. Ex TSQL2
Ad-hoc approaches are complex and are not likely to work
Example 1. Projection (and temporal coalescing)
Name Salary Title VT_start VT_end
Andrea 60000 Ass. Provost 1/1/1993 30/5/1993
Andrea 70000 Ass. Provost 1/6/1993 30/9/1993
Andrea 70000 Provost 1/10/1993 31/1/1994
Andrea 70000 Professor 1/2/1994 31/12/1994
Question: salary history of Andrea
“Intuitive” SQL query:SELECT Salary,VT_start,VT_endFROM EmployeeWHERE Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work
Result obtained: Salary VT_start VT_end
60000 1/1/1993 30/5/1993
70000 1/6/1993 30/9/1993
70000 1/10/1993 31/1/1994
70000 1/2/1994 31/12/1994
Desired result: Salary VT_start VT_end
60000 1/1/1993 30/5/1993
70000 1/6/1993 31/12/1994
Ad-hoc approaches are complex and are not likely to work
How to get the desired result using SQL92:
CREATE TABLE Temp(Salary, VT_start,VT_end)AS SELECT Salary, VT_start, VT_end
FROM EmployeeWHERE Name = “Andrea”;
RepeatUPDATE Temp T1SET (T1.VT_end)=SELECT MAX(T2.VT_end)WHERE T1.Salary=T2.Salary AND T1.VT_start < T2.VT_Start AND
T1.VT_end >= T2.VT_start AND T1.VT_end < T2.VT_endWHERE EXISTS (SELECT *
FROM Temp AS T2WHERE T1.Salary=T2.Salary AND T1.VT_start < T2.VT_Start
AND T1.VT_end >= T2.VT_start AND T1.VT_end < T2.VT_end)
Until no tuples updated
Ad-hoc approaches are complex and are not likely to work
How to get the desired result using SQL92 (continues!):
DELETE FROM Temp T1WHERE EXISTS (SELECT *
FROM Temp AS T2WHERE T1.Salary = T2.Salary AND ((T1.VT_start > T2.VT_Start) AND (T1.VT_end <= T2.VT_end)) OR ((T1.VT_start >= T2.VT_Start) AND (T1.VT_end < T2.VT_end))
Ad-hoc approaches are complex and are not likely to work
Underlying semantic phenomenon:
Projection on temporal relations involves temporal coalescing about value equivalent tuples
When it occurs (SQL):
Whenever a proper subset of the attributes of the relations is chosen in the SELECT part of the query
Ad-hoc approaches are complex and are not likely to work
How to get the desired result using a Temporal DB (ex. TSQL2)
SELECT SalaryFROM EmployeeWHERE Name = “Andrea”
Ad-hoc approaches are complex and are not likely to work
Example 2. Join (and temporal intersection)
Name Salary VT_start VT_end
Andrea 60000 1/1/1993 30/5/1993
Andrea 70000 1/6/1993 31/12/1994
Employee1
Name Title VT_start VT_end
Andrea Ass. Provost 1/1/1993 30/9/1993
Andrea Provost 1/10/1993 31/1/1994
Andrea Professor 1/2/1994 31/12/1994
Employee2
Query: “combined” history of both Andrea’s salary and title
Ad-hoc approaches are complex and are not likely to work
“Intuitive” SQL query:SELECT Salary, Title, Emp1.VT_start, Emp1.VT_end
Emp2.VT_start, Emp2.VT_end FROM Employee1, Employee2WHERE Employee1.Name=“Andrea” AND
Employee1.Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work
Result obtained:
Salary Emp1.VT_start
Emp1.VT_end
Title Emp2.VT_start
Emp2.VT_end
60000 1/1/1993 30/5/1993 Ass. Provost 1/1/1993 30/9/1993
60000 1/1/1993 30/5/1993 Provost 1/10/1993 31/1/1994
60000 1/1/1993 30/5/1993 Professor 1/2/1994 31/12/1994
70000 1/6/1993 31/12/1994 Ass. Provost 1/1/1993 30/9/1993
70000 1/6/1993 31/12/1994 Provost 1/10/1993 31/1/1994
70000 1/6/1993 31/12/1994 Professor 1/2/1994 31/12/1994
Ad-hoc approaches are complex and are not likely to work
Result desired:
Salary Title VT_start VT_end
60000 Ass. Provost 1/1/1993 30/5/1993
70000 Ass. Provost 1/6/1993 30/9/1993
70000 Provost 1/10/1993 31/1/1994
70000 Professor 1/2/1994 31/12/1994
Ad-hoc approaches are complex and are not likely to work
How to get the desired result using SQL92:SELECT Employee1.Name,Salary,Dept,Employee1.VT_start,Employee1.VT_endFROM Employee1, Employee2WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start <= Employee1.VT_start AND Employee1.VT_end < Employee2.VT_endUNIONSELECT Employee1.Name,Salary,Dept,Employee1.VT_start,Employee2.VT_endFROM Employee1, Employee2WHERE Employee1.Name=Employee2.Name AND Employee1.VT_start >= Employee2.VT_start AND Employee2.VT_end < Employee1.VT_end AND Employye1.VT_start < Employee2.VT_endUNIONSELECT Employee1.Name,Salary,Dept,Employee2.VT_start,Employee1.VT_endFROM Employee1, Employee2WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start > Employee2.VT_start AND Employee1.VT_end < Employee2.VT_end AND Employye2.VT_start < Employee1.VT_endUNIONSELECT Employee1.Name,Salary,Dept,Employee2.VT_start,Employee2.VT_endFROM Employee1, Employee2WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start > Employee1.VT_start AND Employee2.VT_end < Employee1.VT_end
Ad-hoc approaches are complex and are not likely to work
Underlying semantic phenomenon:
Join (Cartesian product) on temporal relations involves temporal intersection
When it occurs (SQL):
Whenever more than one relation is used in the FROM part of the query
Note: the number of terms in the SQL union is 2n, where n is the number of relations in the FROM part
Ad-hoc approaches are complex and are not likely to work
How to get the desired result using a Temporal DB (ex. TSQL2)
SELECT Salary, TitleFROM Employee1, Employee2WHERE Employee1.Name=“Andrea” AND
Employee1.Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work
And what about:- Union, difference, …, nested queries- Temporal predicates- Primary\secondary keys- Aggregate functions- Integrity constraints- Multiple (user-defined!) granularities- ……- arbitrary combinations of all the above issues
Until now, just two simple examples concerning:- SELECT a subset of attributes ( loop to do colaescing)- FROM with >1 relations (exponential union to do intersection)
?
Ad-hoc approaches are complex and are not likely to work
Key message:- Dealing with temporal data is a general problem in DB’s- Difficult problems (often “hidden” ones) have to be faced
WHY? (informally)
(1) Relational model assumes independence of attribute values
Time attributes predicates about when the values of the other attributes hold in the modeled reality (VT) or are current in the DB (TT)
Ad-hoc approaches are complex and are not likely to work
WHY? (informally)
(2) Relational model assumes all data\knowledge is explicit
The association of time to tuples involves “semantic assumptions”
Such assumptions are part of our “commonsense” way of looking at time. However, they are NOT supported by the relational model (as long as they are left implicit)
Making the semantic of time explicit is a complex task (everywhere, not only in DBs)
Ad-hoc approaches are complex and are not likely to work
From a Software Engeneering point of view:Letting applications solve the problem in an ad-hoc way is:- Both cost and time expensive- Likely to lead to errors in the applications- Likely to make integration (shared data) between applications impossible
Ad-hoc approaches are complex and are not likely to work
Temporal DB: an area of research aiming at
providing once-and-forall principled and integrated solution to the problem