PostgreSQL: Beyond "Standard" Relational Model
Igor A.Gaponenko
Lawrence Berkeley National Laboratory( [email protected] )
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
2
Motivation…
Anti-goals Not to cover SQL92 or SQL99 standards Not to compare directly MySql and PostgreSQL or any other (object-)relational
databases like Oracle 9i
Goals Cognitive
learning advanced features of object-relational model and its particular implementation (What's beyond the original relational model and primitive data types.).
Practical looking for an adequate persistent technology to re-implement the Condition/DB and
other non- Event Store databases of the BaBar Experiment. Note, that requirements for the migration are beyond the scope of the talk - they are "implied" by
the problem domain.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
3
Object-Relational Model Foundation (1)
Why? An attempt to benefit from both “Real World” modeling and performance
superiority of ODBMS over RDBMS. An attempt to address shortcomings of SQL and "Object Oriented" database
systems. As a way of rethinking original Codd's (relational) model.
The Third Object-Relational Database Manifesto A proposal for the future direction of data and database management systems.
Provides a foundation for integrating relational and object technologies. Published:
C.J. Date and H. Darwen. "A Foundation for Object Relational Database Systems: The third manifesto. Addison-Wesley, 1998.“
C.J. Date and H. Darwen. "Foundation for Future Database Systems: The Third Manifesto (2nd Edition). Addison-Wesley, 2000.“
Standards: SQL92 SQL99 (The Object-Relational one)
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
4
Object-Relational Model Foundation (2)
The relation remains THE cornerstone concept, however in ORDBMS it's extended with: Structured types for attributes (in addition to atomic types):
structures, sets, arrays, bugs, etc.
Methods: special operations to be defined and applied to values of user defined types.
Identifiers for tuples (similar to "object identifiers" in ODBMS). They're generally invisible to users.
References (to tuples). Nested relations ("inclusive polymorphism") as a way it's a way to extend relations
SQL99 defines both single and multiple inheritance
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
5
PostgreSQL: History, Versions and Platforms
Derives from UC Berkeley's "Ingres" and "Postgres95" academic databases projects. Is freely available with the source. Is distributed under BSD license.
Current stable version 7.3.2 (beginning 2003). Current development version 7.4 (to be completed by the end of 2003). Next stable version 7.5 or 8.0 (2004). In constantly under improvement (there are real people behind it!). Commercial flavors are also available.
All UNIX-es are supported (including MacOS X). There are "native" ports of some earlier (7.2.x) versions onto MS Windows NT/2000/XP. Cygwin is also an option for newer versions. Full support for MS will be added as of 7.5 or 8.0.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
6
PostgreSQL vs MySQL: Which one is better (“better” for what)?
See a very interesting discussion on this subject at:
“A Response to the Featurewise Comparison of MySQL and PostgreSQL” by Peter Eisentraut, PostgreSQL Global Development Group
http://developer.postgresql.org/~petere/comparison.html
Conclusion (no surprise!): PostgreSQL is better :-) Now let’s take a tour over advanced features of PostgreSQL…
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
7
Client-Server Architecture
Current state of affairs: There is a (post-)"master" process launching a number of "work" processes doing
the actual work on behalf of clients. The number is controlled through a configuration file.
It fits well into SMP architecture by relying on automatic load balancing done by the corresponding operating system.
Does not benefit from multi-threading. Is this really needed? There is no support for cluster based installation (to run "work" processes on
different hosts).
On-going developments: "Replicated" DBMS. There are a few projects. see GBorg from
WWW.PostgreSQL.org for details.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
8
Limitations of PostgreSQL
Theoretical limits Maximum size for a database : unlimited (4 TB databases exist) Maximum size for a table : 16 TB on all operating systems Maximum size for a row : 1.6 TB Maximum size for a field : 1 GB Maximum number of rows in a table : unlimited Maximum number of columns in a table : 250 - 1600 (depending on column types) Maximum number of indexes on a table : unlimited
System configuration limits The are imposed by available disk space and memory/swap space. This is also
related to the performance of the database. The maximum table size and maximum number of columns can be increased if the
default block size is increased to 32k.
2 GB File Limit Issue (is not an issues): The maximum table size of 16 TB does not require large file support from the
operating system. Large tables are stored as multiple 1 GB files so file system size limits are not important.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
9
Concurrency Control: MVCC Transaction Model
"...Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data consistency by using a multi-version model (Multi-version Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session...“
"...The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading...“
"...Table- and row-level locking facilities are also available in PostgreSQL for applications that cannot adapt easily to MVCC behavior. However, proper use of MVCC will generally provide better performance than locks..."
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
10
Concurrency Control: SQL Transaction Isolation Levels (1)
Three known phenomena: "dirty read" : A transaction reads data
written by a concurrent uncommitted transaction.
"nonrepeatable read" : A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
"phantom read" : A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
level dirty Non-repeatable
phantom
Read uncommited
x x x
Read commited
x x
Repeatable read
x
serializable
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
11
Concurrency Control: SQL Transaction Isolation Levels (2)
START TRANSACTION
[ ISOLATION LEVEL { READ COMMITTED|SERIALIZABLE } ]
[ READ WRITE | READ ONLY ]
ROLLBACK
COMMIT
SELECT FOR UPDATE
PostgreSQL provides: READ COMMITED SERIALIZABLE
Implicit locking (tables) Explicit locking (rows)
A problem of “deadlocks”
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
12
Tables with OID-s
Non-standard SQL92 or 99 option! OID := unsigned 4 byte integer; can't be used to address rows in
databases and even large tables. Their use as primary keys is discouraged except system tables.
OID-s are used internally by PostgreSQL as primary keys for various system tables.
User defined tables (tuples) may have OID-s explicitly visible to clients:
CREATE TABLE <name> ...[ WITHOUT OIDS ] ...
SELECT * FROM sample; oid | Name--------+---------------- 123456 | Igor Gaponenko (1 rows)
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
13
Arrays
CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);
Any atomic type can be used here.Arrays can be multidimensional.
CREATE TABLE tictactoe ( squares integer[3][3]);
Using predefined size (is not actually enforced)
INSERT INTO sal_emp
VALUES ('Carol',
'{20000, 25000, 25000, 25000}',
'{{"talk", "consult"}, {"meeting"}}‘
);
Special syntax for inserting multiple values:
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
14
Creating new types (1)
CREATE TYPE typename (
INPUT = input_function,
OUTPUT = output_function
, INTERNALLENGTH = { internallength | VARIABLE }
[ , EXTERNALLENGTH = { externallength | VARIABLE } ]
[ , DEFAULT = "default" ] [ , ELEMENT = element ]
[ , DELIMITER = delimiter ]
[ , SEND = send_function ]
[ , RECEIVE = receive_function ]
[ , PASSEDBYVALUE ]
[ , ALIGNMENT = alignment ]
[ , STORAGE = storage ] )
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
15
Creating new types (2)
CREATE FUNCTION zero_out(opaque) RETURNS opaque
AS '/usr/local/pgsql/lib/zero.so' LANGUAGE 'C';
CREATE FUNCTION zero_in(opaque) RETURNS zero
AS '/usr/local/pgsql/lib/zero.so' LANGUAGE 'C';
CREATE TYPE zero
(internallength = 16,
input = zero_in,
output = zero_out);
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
16
Creating new types (3)
Where to see real life examples: Go to the "PostGIS" Web site:
http://postgis.refractions.net/
as an example of extending PostgreSQL with 3-D geographic objects.
Geometric objects in PostgreSQL is another example
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
17
Defining operators and functions…
To be finished…
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
18
Triggers…
To be finished…
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
19
Cursors...
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
20
Stored procedures…
PL/pgSQL PL/pgPerl PL/pgPython C
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
21
Rules
Creating user defined rules is PSQL extension allowing to change the semantics of SELECT, INSERT, UPDATE, or DELETE commands.
It's a way of doing something extra in addition to the original command or even substitute the command with another command.
CREATE RULE "_RETURN" AS
ON SELECT TO t1 DO INSTEAD
SELECT * FROM t2;
SELECT * FROM t1;
CREATE RULE notify_me AS ON UPDATE TO mytable DO NOTIFY mytable;
UPDATE mytable SET name = 'foo' WHERE id = 42;
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
22
Indexes
Available indexes: B-Tree (Lehman-Yao high concurrency algorithm) R-Tree (standard R-trees using Guttman’s quadratic split algorithm)
for spatial information) (deprecated in favor of GiST) GiST Hash
Multicolumn indexes are also allowed (up to 32 columns) Query Optimizer will use appropriate ones when performing queries
CREATE TABLE test (
integer id,
...
);
CREATE INDEX test_id_index ON test (id);
CREATE INDEX test_id_hash ON test USING RTREE (id);
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
23
Database Management : Requirements
General problems to be solved in a context of HEP databases: managing a database installation(-s) at a site
providing database integrity: backup, restore, contents management
controlling access to the database: Authentications, authorization, ACL-s, etc.
distributing data between database installations around a collaboration “master” and “mirrors”
Sharing (exchanging) data between database installations: “user1” and “user2”
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
24
Database Management : Contents Management
PostgeSQL has: Certain degree of control over data clustering:
A server can serve multiple “database clusters” (maps to ‘DATABASE’ in SQL)
Each “database cluster” is self-sufficient (schema + tables)
Clusters may spread across different file systems
“Garbage Collection“ mechanism: VACUUM command (not SQL Standard)
It provides: Remove any leftover data from rollbacks and other processes that can leave temporary data
(garbage collection) Analyze activity in the database to assist PostgreSQL in designing efficient queries.
It’s meant for space/performance optimization
It does not interfere with normal database operations. It will slow them down though!
Is supposed to be run in periods of “natural” inactivity
"Schema documentation": COMMENT command (not SQL Standard)
It complements naming conventions for database schema components.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
25
Database Management : Copy/Backup/Restore
Storing the contents of tables in files in binary/text format and loading them back into tables.
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
...
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
...
“Hot” backup/restore operations w/o interrupting users of a database 'pg_dump' creates a set of SQL commands to backup/restore whole database the actual backup/restore procedures: 'psql' for plain text dumps, 'pg_restore' for other
(compressed, binary) dumps. “HOT” restore mode is supported for data backups oonly. It’s available due to 'multi-
version' transaction control (MVCC) system with active users in the middle of transactions.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
26
Database Management : Authentication, etc.
Encryption of client-server protocol: Build-in SSL (compilation option) SSH/OpenSSH tunneling (requires S-Shell access to the database server) Stunnel (no S-Shell access is required)
"host-based authentication“ Special files in each “database cluster”:
pg_hba.conf pg_ident.conf
From PostgreSQL documentation: "...Put simply, the pg_hba.conf file allows you to determine who is allowed to connect to
which databases from what machines, and to what degree they must prove their authenticity to gain access..."
host all 127.0.0.1 255.255.255.255 trust
host template1 192.167.123.15 255.255.255.255 reject
host gapon 192.167.123.14 255.255.255.255 crypt
host template1 192.167.123.13 255.255.255.255 ident sales
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
27
Database Management : Access Control Lists (1)
From PostgreSQL documentation: "...users and groups can allow for fine-grained, versatile access control to your
database objects...“
"...PostgreSQL stores both user and group data within its own system catalogs. These are different from the users and groups defined within the operating system on which the software is installed. Any connection to PostgreSQL must be made with a specific user, and any user may belong to one or more defined groups...“
"...Users control the allocation of rights and track who is allowed to perform actions on the system (and which actions they may perform). Groups exist as a means to simplify the allocation of these rights. Both users and groups exist as global database objects, which means they are not tied to any particular database..."
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
28
Database Management : Access Control Lists (2)
CREATE USER ... WITH PASSWORD '...' ALTER USER ... WITH PASSWORD '...' SELECT * FROM pg_shadow;
Users and passwords are stored in a special system table
GRANT privilege [, ...] ON object [, ...] TO { PUBLIC | username | GROUP groupname }
REVOKE privilege [, ...] ON object [, ...] FROM { PUBLIC | username | GROUP groupname }
Users are owners of database objects they’re creating. They can also grant/revoke privileges to/from other users or groups.
November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model
29
Database Management : Access Control Lists (3)
CREATE VIEW stock_view
AS SELECT isbn, retail, stock FROM stock;
GRANT SELECT ON stock_view TO GROUP sales;
CREATE USER barbara;
GRANT USER barbara SELECT ON stock_view;
SELECT * FROM stock_view;
Using views to grant access to subsets of tables