working with databases in perl

11.04.23 -

DépartementOffice

Working with databasesin Perl

Tutorial for FPW::2011, Paris

[email protected]

DépartementOffice

Overview

• intended audience : beginners– in Perl– in Databases

• main topics– Relational databases– Perl DBI basics– Advanced Perl DBI– Object-Relational Mappings

• disclaimer– didn't have personal exposure to everything mentioned in this

tutorial

11.04.23 - Page 1

DépartementOffice

Relational databases

RDBMS = Relational Database Management System

join on c3

Relational model

c1 c2 c3

1 foo 1

2 foo 2

3 bar 1

c3 c4

1 xx

2 yyfilter

Table (rows + columns)

projection

c1 c2 c3 c4

1 foo 1 xx

2 foo 2 yy

3 bar 1 xx

Maybe you don't want a RDBMS

• Other solutions for persistency in Perl:• BerkeleyDB : persistent hashes / arrays• Judy : persistent dynamic arrays / hashes• Redis : persistent arrays / hashes / sets / sorted sets• CouchDB : OO/hierarchical database • MongoDB : document-oriented database • KiokuDB : persistent objects, front-end to BerkeleyDB / CouchDB

/ etc.• Plain Old File (using for example File::Tabular )• KinoSearch : bunch of fields with fulltext indexing• LDAP : directory• Net::Riak : buckets and keys

– See http://en.wikipedia.org/wiki/NoSQL

Features of RDBMS

• Relational• Indexing• Concurrency• Distributed• Transactions (commit / rollback )• Authorization• Triggers and stored procedures• Internationalization• Fulltext• …

Choosing a RDBMS

• Sometimes there is no choice (enforced by context) !

• Criteria– cost, proprietary / open source– volume– features– resources (CPU, RAM, etc.)– ease of installation / deployment / maintenance– stored procedures

• Common choices (open source)– SQLite (file-based)– mysql– Postgres

• Postgres can have server-side procedures in Perl !

Talking to a RDBMS

• SQL : Standard Query Language.

Except that

– the standard is hard to find (not publicly available)

– vendors rarely implement the full standard

– most vendors have non-standard extensions

– it's not only about queries• DML : Data Manipulation Language• DDL : Data Definition Language

Writing SQL

SQL is too low-level, I don't ever want to see it

SQL is the most important part of my application, I won't let

anybody write it for me

Data Definition Language (DDL)

CREATE TABLE author (author_id INTEGER PRIMARY KEY,author_name VARCHAR(20),e_mail VARCHAR(20),…

);

CREATE/ALTER/DROP/RENAMEDATABASEINDEXVIEWTRIGGER

Data Manipulation Language (DML)

SELECT author_name, distribution_nameFROM author INNER JOIN distribution ON author.author_id = distribution.author_id WHERE distribution_name like 'DBD::%';

INSERT INTO author ( author_id, author_name, e_mail ) VALUES ( 123, 'JFOOBAR', '[email protected]' );

UPDATE authorSET e_mail = '[email protected]'

WHERE author_id = 3456;

DELETE FROM author WHERE author_id = 3456;

Best practice : placeholders

SELECT author_name, distribution_nameFROM author INNER JOIN distribution ON author.author_id = distribution.author_id WHERE distribution_name like ? ;

INSERT INTO author ( author_id, author_name, e_mail ) VALUES ( ?, ?, ? );

UPDATE authorSET e_mail = ?

WHERE author_id = ? ;

DELETE FROM author WHERE author_id = ?;

no type distinction (int/string) statements can be cached avoid SQL injection problems

SELECT * FROM foo WHERE val = $x;

$x eq '123; DROP TABLE foo'

• sometimes other syntax (for ex. $1, $2)

11.04.23 - Page 1

DépartementOffice

Perl DBI Basics

Architecture

Database

DBD driver

DBI

Object-Relational Mapper

Perl program

TIOOWTDI

There is onlyone way to do it

TAMMMWTDI

There are many,many manyways to do it

TIMTOWTDI

There is more thanone way to do it

DBD Drivers

– Databases• Adabas DB2 DBMaker Empress Illustra Informix Ingres InterBase

MaxDB Mimer Oracle Ovrimos PO Pg PrimeBase QBase Redbase SQLAnywhere SQLite Solid Sqlflex Sybase Unify mSQL monetdb mysql

– Other kinds of data stores• CSV DBM Excel File iPod LDAP

– Proxy, relay, etc• ADO Gofer JDBC Multi Multiplex ODBC Proxy SQLRelay

– Fake, test• NullP Mock RAM Sponge

When SomeExoticDB has no driver

• Quotes from DBI::DBD :" The first rule for creating a new database driver for the Perl DBI is very

simple: DON'T! "" The second rule for creating a new database driver for the Perl DBI is

also very simple: Don't -- get someone else to do it for you! "

• nevertheless there is good advice/examples– see DBI::DBD

• Other solution : forward to other drivers– ODBC (even on Unix)– JDBC– SQLRelay

DBI API

• handles– the whole package (DBI)– driver handle ($dh)– database handle ($dbh)– statement handle ($sth)

• interacting with handles– objet-oriented

• ->connect(…), ->prepare(…), ->execute(...), …

– tied hash• ->{AutoCommit}, ->{NAME_lc}, ->{CursorName}, …

Connecting

my $dbh = DBI->connect($connection_string);

my $dbh = DBI->connect($connection_string, $user,

$password, { %attributes } );

my $dbh = DBI->connect_cached( @args );

Some dbh attributes

• AutoCommit – if true, every statement is immediately committed– if false, need to call

$dbh->begin_work();… # inserts, updates, deletes$dbh->commit();

• RaiseError– like autodie for standard Perl functions : errors raise exceptions

• see also– PrintError– HandleError– ShowErrorStatement

• and also– LongReadLen– LongTrunkOK– RowCacheSize– …

hash API : attributes can be set dynamically

[local] $dbh->{$attr_name} = $val

• peek at $dbh internals

DB<1> x $dbh {} DB<2> x tied %$dbh {…}

Data retrieval

my $sth = $dbh->prepare($sql);$sth->execute( @bind_values );

my @columns = @{$sth->{NAME}};

while (my $row_aref = $sth->fetch) { …}

# or$dbh->do($sql);

• see also : prepare_cached

Other ways of fetching

• single row• fetchrow_array• fetchrow_arrayref (a.k.a fetch)• fetchrow_hashref

• lists of rows (with optional slicing)• fetchall_arrayref• fetchall_hashref

• prepare, execute and fetch• selectall_arrayref• selectall_hashref

• vertical slice• selectcol_arrayref little DBI support for

cursors

11.04.23 - Page 1

DépartementOffice

Advanced Perl DBI

Transactions

$dbh->{RaiseError} = 1; # errors will raise exceptions

eval {$dbh->begin_work(); # will turn off AutoCommit… # inserts, updates, deletes$dbh->commit();

};if ($@) {

my $err = $@;eval {$dbh->rollback()};my $rollback_result = $@ || "SUCCESS";die "FAILED TRANSACTION : $err" . "; ROLLBACK: $rollback_result";

} • encapsulated in DBIx::Transaction or ORMs $schema->transaction( sub {…} );

• nested transactions : must keep track of transaction depth

• savepoint / release : only in DBIx::Class

Efficiency

my $sth = $dbh->prepare(<<'');SELECT author_id, author_name, e_mail FROM author

my ($id, $name, $e_mail);$sth->execute;$sth->bind_columns(\ ($id, $name, $e_mail));

while ($sth->fetch) { print "author $id is $name at $e_mail\n";}

avoids cost of allocating / deallocating Perl variables don't store a reference and reuse it after another fetch

Metadata

• datasourcesmy @sources = DBI->data_sources($driver);

• table_infomy $sth = $dbh->table_info(@search_criteria);while (my $row = $sth->fetchrow_hashref) { print "$row->{TABLE_NAME} : $row->{TABLE_TYPE}\n";}

• others– column_info()– primary_key_info()– foreign_key_info()

many drivers only have partial implementations

Lost connection

• manual recoverif ($dbh->errstr =~ /broken connection/i) { … }

• DBIx::RetryOverDisconnects– intercepts requests (prepare, execute, …)– filters errors– attemps to reconnect and restart the transaction

• some ORMs have their own layer for recovering connections

• some drivers have their own mechanism$dbh->{mysql_auto_reconnect} = 1;

Datatypes

• NULL undef

• INTEGER, VARCHAR, DATE perl scalar– usually DWIM works– if needed, can specify explicitly

$sth->bind_param($col_num, $value, SQL_DATETIME);

• BLOB perl scalar

• ARRAY (Postgres) arrayref

Large objects

• usually : just scalars in memory

• when reading : control BLOB size$dbh->{LongReadLen} = $max_bytes;$dbh->{LongTrunkOK} = 1

• when writing : can inform the driver$sth->bind_param($ix, $blob, SQL_BLOB);

• driver-specific stream API. Ex :– Pg : pg_lo_open, pg_lo_write, pg_lo_lseek– Oracle : ora_lob_read(…), ora_lob_write(…),

ora_lob_append(…)

Tracing / profiling

• $dbh->trace($trace_setting, $trace_where)– 0 - Trace disabled. – 1 - Trace top-level DBI method calls returning with results or

errors. – 2 - As above, adding tracing of top-level method entry with

parameters.– 3 - As above, adding some high-level information from the driver

and some internal information from the DBI.

• $dbh->{Profile} = 2; # profile at the statement level

– many powerful options– see L<DBI::Profile>

Stored procedures

my $sth = $dbh->prepare($db_specific_sql);

# prepare params to be passed to the called procedure$sth->bind_param(1, $val1);$sth->bind_param(2, $val2);

# prepare memory locations to receive the results$sth->bind_param_inout(3, \$result1);$sth->bind_param_inout(4, \$result2);

# execute the whole thing$sth->execute;

11.04.23 - Page 1

DépartementOffice

Object-Relational Mapping (ORM)

ORM Principle

r1r2...

c1 c2 c3

...

c3 c4

+c1: String+c2: String+c3: class2

r1 : class1

RDBMS

r2 : class1

Application

table1

table2

ORM: What for ?

[catalyst list] On Thu, 2006-06-08, Steve wrote:

Not intending to start any sort of rancorous discussion, but I was wondering whether someone could illuminate me a little?

I'm comfortable with SQL, and with DBI. I write basic SQL that runs just fine on all databases, or more complex SQL when I want to target a single database (ususally postgresql).

What value does an ORM add for a user like me?

ORM useful for …

• dynamic SQL– navigation between tables– generate complex SQL queries from Perl datastructures– better than phrasebook or string concatenation

• automatic data conversions (inflation / deflation)• expansion of tree data structures coded in the relational model• transaction encapsulation • data validation• computed fields• caching• schema deployment• …

See Also : http://lists.scsys.co.uk/pipermail/catalyst/2006-June/008059.html

Impedance mismatch

• SELECT c1, c2 FROM table1 missing c3, so cannot navigate to class2 is it a valid instance of class1 ?

• SELECT * FROM table1 LEFT JOIN table2 ON … what to do with the c4 column ? is it a valid instance of class1 ?

• SELECT c1, c2, length(c2) AS l_c2 FROM table1 no predeclared method in class1 for accessing l_c2

c1 c2 c3 c3 c4+c1: String+c2: String+c3: class2

r1 : class1 RDBMSRAMtable1 table2

ORM Landscape

• Leader– DBIx::Class (a.k.a. DBIC)

• Also discussed here– DBIx::DataModel

• Many others– Rose::DB, Jifty::DBI, Fey::ORM, ORM,

DBIx::ORM::Declarative, Tangram, Coat::Persistent,DBR, DBIx::Sunny, DBIx::Skinny, DBI::Easy, …

Model (UML)

Artist

CD Track

1

*

1 *

DBIx::Class Schema

package MyDatabase::Main; use base qw/DBIx::Class::Schema/; __PACKAGE__->load_namespaces;

package MyDatabase::Main::Result::Artist; use base qw/DBIx::Class/; __PACKAGE__->load_components(qw/PK::Auto Core/); __PACKAGE__->table('artist'); __PACKAGE__->add_columns(qw/ artistid name /); __PACKAGE__->set_primary_key('artistid'); __PACKAGE__->has_many('cds' => 'MyDatabase::Main::Result::Cd');

package ... ...

DBIx::Class usage

my $schema = MyDatabase::Main ->connect('dbi:SQLite:db/example.db');

my @artists = (['Michael Jackson'], ['Eminem']); $schema->populate('Artist', [ [qw/name/], @artists, ]);

my $rs = $schema->resultset('Track')->search( { 'cd.title' => $cdtitle }, { join => [qw/ cd /], } ); while (my $track = $rs->next) { print $track->title . "\n"; }

DBIx::DataModel Schema

package MyDatabase;use DBIx::DataModel;

DBIx::DataModel->Schema(__PACKAGE__)

->Table(qw/Artist artist artistid/)->Table(qw/CD cd cdid /)->Table(qw/Track track trackid /)

->Association([qw/Artist artist 1 /], [qw/CD cds 0..* /])->Composition([qw/CD cd 1 /], [qw/Track tracks 1..* /]);

DBIx::DataModel usage

my $dbh = DBI->connect('dbi:SQLite:db/example.db');

MyDatabase->dbh($dbh);

my @artists = (['Michael Jackson'], ['Eminem']);MyDatabase::Artist->insert(['name'], @artists);

my $statement = MyDatabase->join(qw/CD tracks/)->select( -columns => [qw/track.title|trtitle …/], -where => { 'cd.title' => $cdtitle }, -resultAs => 'statement', # default : arrayref of rows);

while (my $track = $statement->next) { print "$track->{trtitle}\n";}

11.04.23 - Page 1

DépartementOffice

Conclusion

Further info

• Database textbooks• DBI manual (L<DBI>, L<DBI:.FAQ>,

L<DBI::Profile>)• Book : "Programming the DBI"• Vendor's manuals• ORMs

– DBIx::Class::Manual– DBIx::DataModel

mastering databases requires a lot of reading !

11.04.23 - Page 1

DépartementOffice

Bonus slides

Names for primary / foreign keys

• primary : unique; foreign : same name

author.author_id distribution.author_id• RDBMS knows how to perform joins ( "NATURAL JOIN" )

• primary : constant; foreign : unique based on table + column name

author.id distribution.author_id• ORM knows how to perform joins (RoR ActiveRecord)• SELECT * FROM table1, table2 …. which id ?

• primary : constant; foreign : just table name

author.id distribution.author• $a_distrib->author() : foreign key or related record ?

columns for joins should always be indexed

Locks and isolation levels

• Locks on rows– shared

• other clients can also get a shared lock• requests for exclusive lock must wait

– exclusive• all other requests for locks must wait

• Intention locks (on whole tables)– Intent shared– Intent exclusive

• Isolation levels– read-uncommitted– read-committed– repeatable-read– serializable

SELECT … FOR READ ONLYSELECT … FOR UPDATESELECT … LOCK IN SHARE MODE

LOCK TABLE(S) … READ/WRITE

SET TRANSACTION ISOLATION LEVEL …

Cursors

my $sql = "SELECT * FROM SomeTable FOR UPDATE"; my $sth1 = $dbh->prepare($sql);$sth1->execute();my $curr = "WHERE CURRENT OF $sth1->{CursorName}";

while (my $row = $sth1->fetch) {if (…) { $dbh->do("DELETE FROM SomeTable WHERE $curr");

} else { my $sth2 = $dbh->prepare( "UPDATE SomeTable SET col = ? WHERE $curr");

$sth2->execute($new_val); …

Modeling (UML)

Author

Distribution Module

1

*

1 *

► depends on* *

► contains

Terminology

Author

Distribution Module

1

*

1 *

► depends on* *

► contains

multiplicity

associationname

class

association

composition

Implementation

author_idauthor_namee_mail

1

*

1 *

* *

Author

distrib_idmodule_id

Dependency

distrib_iddistrib_named_releaseauthor_id

Distribution

module_idmodule_namedistrib_id

Module

1 1

link table forn-to-n association

working with databases in perl

Technology

insertinto author author

tableauthor author

dbi api

perl tutorial

relational databases

dbh statement handle

dbh attributes autocommit

package dbi driver handle