tracking your data across the fourth dimension

65
Confoo 2015 Tracking your data across the fourth dimension Jeremy Cook

Upload: jeremy-cook

Post on 17-Jul-2015

163 views

Category:

Technology


4 download

TRANSCRIPT

Confoo 2015

Tracking your data across the fourth dimension Jeremy Cook

This talk is about temporal databases

–Wikipedia

“A temporal database is a database with built-in support for handling data involving time…”

– Jeff Carouth, https://twitter.com/jcarouth/status/496842218674470912

“Tonight @JCook21 explained temporal databases and I’m sure my brain is now leaking out of my

nose.”

What problem do temporal databases solve?

Databases are good at ‘now’

❖ Create

❖ Read

❖ Update

❖ Delete

❖ At any point we only see the current state of the data

Databases are good at ‘now’

❖ How many people work in each department of the company?

❖ For each product category how many products are in stock? Where is the stock located at?

❖ How many orders are currently in each fulfilment state?

The fourth dimension❖ Show me how salaries paid have changed by

department for each quarter over the last 4 years and how they’re forecast to change next year

❖ Show me how stock levels have changed over time. How much stock are we forecast to have at any point in the future?

❖ For audit purposes show me a complete history of every change to this data, what period of time each change was valid for and when we knew about any changes

The fourth dimension

Some Temporal Database Theory

Temporal aspects

Decision Time

❖ Records the time at which a decision was made

❖ Modelled as a single value

❖ Allows for granularity through the data type used

Decision Time

EmpId Name Hire Date Decision to Hire

1 Jeremy 2014-03-03 2014-01-20

2 Anna 2015-01-02 2013-12-15

3 Yann 2013-08-20 2013-08-20

Valid Time

“In temporal databases, valid time (VT) is the time period during which a database fact is valid in the

modelled reality.”

–Wikipedia

Valid Time

❖ Modelled as a period of time between two dates

❖ Lower bound is always closed but upper bound can be open

Valid Time

EmpId Name Hire date Termination date

1 Jeremy 2014-03-03 2015-01-20

2 Anna 2015-01-02 ∞

3 Yann 2013-08-20 2015-12-22

4 Colin 2015-05-01 ∞

Valid TimeEmpId Name Dept Hire date Term date StartVT EndVT

1 Jeremy Dev 2014-03-03 ∞ 2014-03-03 2014-07-30

1 Jeremy QA 2014-03-03 2015-01-20 2015-01-21 2015-01-20

2 Anna Dev 2015-01-02 ∞ 2015-01-02 2015-01-30

2 Anna Mgmt 2015-01-02 ∞ 2015-01-31 ∞

3 Yann Mgmt 2013-08-20 2015-12-22 2013-08-20 ∞

4 Colin Dev 2015-05-01 ∞ 2015-05-01 ∞

Job done?

Valid-time on its own may not be enough!

Name Type StartVT EndVT

Saturn Planet Billions of years ago ∞

Pluto Planet Billions of years ago ∞

Valid-time on its own may not be enough!

Name Type StartVT EndVT

Saturn Planet Billions of years ago ∞

Pluto Dwarf planet Billions of years ago ∞

Valid-time on its own may not be enough!

Name Type StartVT EndVT

Saturn Planet Billions of years ago ∞

Pluto Plutoid Billions of years ago ∞

Valid-time on its own may not be enough!

Name Type StartVT EndVT

Saturn Planet Billions of years ago ∞

Pluto Planet Billions of years ago 2006

Pluto Dwarf planet 2006 2008

Pluto Plutoid 2008 ∞

Transaction Time

“In temporal databases, transaction time (TT) is the time period during which a fact stored in the

database is considered to be true.”

–Wikipedia

Transaction Time

❖ Modelled as a period of time between two dates

❖ Lower bound is always closed but upper bound can be open

Transaction Time

Name Type StartVT EndVT StartTT EndTT

Pluto Planet Billions of years ago ∞ 1930 2006

Pluto Dwarf planet

Billions of years ago ∞ 2006 2008

Pluto Plutoid Billions of years ago ∞ 2008 ∞

Valid Time != Transaction Time

Name Clothing StartVT EndVT StartTT EndTT

Father Christmas null A long time

ago ∞ 1973 1975

Santa Claus red A long time ago ∞ 1975 1980

Saint Nicholas red 270 AD ∞ 1980 1982

How many temporal aspects should you use?

❖ As many or few as your application needs!

❖ Tables that implement two aspects are bi-temporal

❖ You can implement more aspects, in which case you have multi temporal tables

Is your head spinning?

❖ Decision time records when a decision was taken

❖ Valid Time records the period of time for which the fact is valid

❖ Transaction Time records the period of time for which the fact is considered to be true

SQL:2011 Temporal

A note on the example tablesCREATE TABLE dept (DNo INTEGER,DName VARCHAR(255)

);

CREATE TABLE emp (ENo INTEGER,EName VARCHAR(255),EDept INTEGER

);

Periods

❖ Table component, capturing a pair of columns defining a start and end date

❖ Not a new data type, but metadata about columns in the table

❖ Closed-open constraint

❖ Enforces that end time > start time

Valid time

❖ Also called application time in SQL:2011

❖ Modelled as a pair of date time columns with a period

❖ Name of the columns and period is up to you

Valid time

ALTER TABLE emp ADD (EStart DATE,EEnd DATE,PERIOD FOR EPeriod (EStart, EEnd)

);

Temporal primary keys

❖ SQL:2011 allows a valid time period to be named as part of a primary key

❖ Can also enforce that the valid time periods do not overlap

Temporal primary keys

ALTER TABLE empADD PRIMARY KEY (ENo, EPeriod);

Temporal primary keys

ALTER TABLE empADD PRIMARY KEY (ENo, EPeriod WITHOUT OVERLAPS);

Temporal foreign keys

❖ What happens if a parent and child table both define valid time periods?

❖ It doesn’t make sense to allow a row in a child table to reference a row in a parent table where the valid time does not overlap

❖ SQL:2011 allows valid time periods to be part of foreign key constraints

Temporal foreign keys

ALTER TABLE dept ADD (DStart DATE,DEnd DATE,PERIOD FOR DPeriod (DStart, DEnd)

);

ALTER TABLE empADD FOREIGN KEY (Edept, EPeriod)REFERENCES dept (DNo, PERIOD DPeriod);

Querying valid time tables

❖ Can query against valid time columns as normal - they’re just normal table columns

❖ Updates and deletes can be performed for a period of a valid time time period

Querying valid time tables❖ SQL:2011 allows you to create periods to use in your queries

and use new predicates:

❖ CONTAINS

❖ OVERLAPS

❖ EQUALS

❖ PRECEDES

❖ SUCCEEDS

❖ IMMEDIATELY SUCCEEDS and IMMEDIATELY PRECEDES

Querying valid time tables

UPDATE EmpFOR PORTION OF EPeriodFROM DATE '2011-02-03'TO DATE '2011-09-10'

SET EDept = 4WHERE ENo = 22217;

Querying valid time tables

DELETE EmpFOR PORTION OF EPeriodFROM DATE '2011-02-03'TO DATE '2011-09-10'

WHERE ENo = 22217;

Querying valid time tables

SELECT EName, EdeptFROM EmpWHERE ENo = 22217AND EPeriod CONTAINS DATE '2015-01-23';

Querying valid time tables

SELECT EName, EdeptFROM EmpWHERE ENo = 31AND EPeriod OVERLAPS PERIOD (DATE '2015-01-01', DATE '2015-01-31');

Transaction time

❖ Also known as system time in SQL:2011

❖ Modelled as two DATE or TIMESTAMP columns

❖ Management of the columns for the period is handled by the database for you

Transaction time

❖ When data is inserted:

❖ Start of transaction time is set to current time

Transaction time

❖ When data is updated:

❖ Transaction time end is set to current time on the existing row

❖ A new row is added with the updated date and a transaction time start of the current time

Transaction time

❖ When data is deleted:

❖ Transaction time end is set to current time in the existing row

Transaction time

❖ Because the system manages transaction time:

❖ Not possible to alter transaction time values in the past

❖ Not possible to add future dated transaction time values

❖ Referential constraints on historical data are never checked

Transaction time

CREATE TABLE emp (…,Sys_start TIMESTAMP(12) GENERATED ALWAYS

AS ROW START,Sys_end TIMESTAMP(12) GENERATED ALWAYS

AS ROW END,PERIOD FOR SYSTEM_TIME (Sys_start,

Sys_end)) WITH SYSTEM VERSIONING;

Querying transaction time tables

❖ New predicates to be used with transaction time:

❖ FOR SYSTEM_TIME AS OF

❖ FOR SYSTEM_TIME FROM

❖ FOR SYSTEM_TIME BETWEEN

❖ If none of the above supplied the database should only return rows for the current system time

Querying transaction time tables

SELECT ENo, ENameFROM empWHERE Eno = 22;

Querying transaction time tables

SELECT ENo, ENameFROM empWHERE ENo = 22FOR SYSTEM_TIME AS OFTIMESTAMP '2015-01-28 12:45:00';

Querying transaction time tables

SELECT ENo, ENameFROM empWHERE ENo = 22AND EPeriod CONTAINS DATE '2014-08-27'FOR SYSTEM_TIME AS OFTIMESTAMP '2015-01-28 12:45:00';

Grey areas/not implemented yet

❖ Evolving schema over time

❖ Support for period joins

❖ Support for period aggregates or period grouped queries

❖ Support for period normalization

❖ Support for multiple valid time periods per table

Which vendors support SQL:2011?

Current support❖ Oracle 12c

❖ SQL:2011 compliant but not even nearly complete

❖ PostgreSQL

❖ 9.1 and earlier: temporal contributed package

❖ 9.2 native ranged data types

❖ IBM DB2 through ‘time travel query’ feature

❖ Teradata 13.10 and 14

❖ Handful of others implemented as extensions

How do I add this stuff to my current schema?

Implementing valid time

❖ Add a pair of date time columns to your table for the valid time period.

❖ Can make these part of your primary key

Implementing valid time

❖ Things to consider:

❖ Have to check for end time > start time

❖ Have to check for overlaps in valid time periods

❖ Temporal foreign keys have to be implemented yourself

❖ Queries become potentially more complex

Implementing transaction time❖ Add a column recording transaction time start to your table

❖ For each table create a backup table mirroring the columns in the main table, adding a transaction time end column too

❖ Create a trigger that fires on each update or delete to copy old values from the main table to the backup table

❖ Should add transaction time end to the backup table

❖ Should also update the transaction time start to now in the main table if the operation is an update

Implementing transaction time

❖ Things to consider:

❖ Extra complexity

❖ How long should backup data be kept for?

❖ Do you optimize for fast reads or writes?

❖ Should truncating the main table delete the data from the backup?

More information

❖ Wikipedia article on Temporal Databases

❖ Temporal features in SQL:2011 (PDF)

❖ Time and Relational Theory

Thanks for listening!

❖ Any questions?

❖ I’d love some feedback

❖ https://joind.in/talk/view/13294

❖ Contact me:

❖ @JCook21

[email protected]