databases and sql programming overview · programming with a procedural language write your own...
TRANSCRIPT
Databases and SQL programming overview
Databases: Digital collections of data
A database system has:
➢ Data + supporting data structures
➢ The management system (DBMS)
Popular DBMS
Commercial: Oracle, IBM, Microsoft
GUI: Microsoft Access, OpenOffice Base
Terminal: MySQL, Postgres, SQLite
Databases: Digital collections of data
Why use a database system?
➢ Large amounts of diverse data
Example: sequence identifiers and expression data
Databases: Digital collections of data
Databases: Digital collections of data
Why use a database system?
➢ Large amounts of diverse data
Example: sequence identifiers and expression data
➢ Many concurrent end-users or collaborators
Databases: Digital collections of data
Databases: Digital collections of data
Biological databases
➢ Complex data➢ Relationships (1-1, 1-many, many-many)➢ End users
Biological databases
SQL (Structured Query Language)
Language for building, accessing, and manipulating a relational database management systems (RDBMS)
➢ Database
➢ Schema: tables, relationships, permissions
➢ The data
➢ Access the data – with SQL queries
PostgreSQL
http://www.postgresql.org/
➢ Install postgres (version 8.4 is the current one in Ubuntu)
➢ Log into your psql terminal
$ psql --help
$ psql -h localhost -U postgres -d my_database
➢ Get help with SQL commands
my_database=>\h
➢ Get help with psql commands
my_database=>\?
SQL server basics
➢ Keywords are not case sensitive.
➢ Table and column names are stored as they are entered (use
lower case as naming convention).
➢ SQL statements terminated with a semicolon.
➢ Elements are comma separated.
➢ Comments are enclosed between /* and */ or preceded by -- .
SQL (Structured Query Language)
Language for building, accessing, and manipulating a relational database management systems (RDBMS)
➢ Database
➢ Create a new database from your psql terminal➢ Create or drop database from your linux terminal
(createdb and dropdb commands)
* Mind owner of the database, and permissions!
SQL (Structured Query Language)
Language for building, accessing, and manipulating a relational database management systems (RDBMS)
➢ Database
➢ Schema: tables, relationships, permissions (DDL)
➢ The data
➢ Access the data – with SQL queries
SQL (Structured Query Language)
➢ Data definition (DDL)➢ Data manipulation
➢ SELECT statements➢ Joins➢ Row functions➢ Aggregate functions➢ Subqueries➢ Views
➢ * PostgreSQL, and database connection with Perl
Data definition
Data types
http://www.postgresql.org/docs/8.4/static/datatype.html
Data definition
Create table
Defines the structure of the table:
CREATE TABLE sample (sample_id serial NOT NULL PRIMARY KEY,sample_name character varying NOT NULL
);
Data definition
Create table
Defines the structure of the table:
CREATE TABLE sample (sample_id serial NOT NULL PRIMARY KEY,sample_name character varying NOT NULL
);
SQL command
Table name
Column name Data type Constraint(s)
Data definition
Constraints are used to enforce valid data in columns
modifies the structure of the table:
NOT NULL
CHECK
PRIMARY KEY
FOREIGN KEY
http://www.postgresql.org/docs/8.4/static/ddl-constraints.html
Data definition
Create table
CREATE TABLE sample (sample_id serial NOT NULL PRIMARY KEY,sample_name character varying NOT NULL,
species character varying);
Or:
ALTER TABLE sample ADD COLUMN species character varying ;
CREATE TABLE sample (sample_id serial NOT NULL PRIMARY KEY,sample_name character varying NOT NULL
);
Data definition
Alter table
modifies the structure of the table:
ALTER TABLE sample ALTER COLUMN species SET NOT NULL;
ALTER TABLE sample DROP COLUMN species;
ALTER TABLE sample ADD COLUMN species text DEFAULT NULL;
Data definition
Drop tables
DROP TABLE sample; Oops!
Data definition
Drop tables
DROP TABLE sample; Oops!
ALWAYS USE TRANSACTIONS!!
=> BEGIN;=> SQL statement 1; SQL statement 2 .... ;
-- I made some mistake...
=> ROLLBACK;
Data definition
ALWAYS USE TRANSACTIONS!!
=> BEGIN;=> SQL statement 1; SQL statement 2 .... ;
-- Looks good!
=> COMMIT;
Data definition
Foreign keys
CREATE TABLE sample (sample_id serial NOT NULL PRIMARY KEY,sample_name character varying NOT NULL,
species_id integer REFERENCES species(species_id));
CREATE TABLE species (species_id serial PRIMARY KEY,species_name character varying NOT NULL
);
Foreign key constraint
species
species_id (PK)species_name
animal
animal_id (PK)animal_namespecies_id (FK)
SQL (Structured Query Language)
Language for building, accessing, and manipulating a relational database management systems (RDBMS)
➢ Database
➢ Schema: tables, relationships, permissions (DDL)
➢ The data (INSERT, UPDATE, DELETE)
➢ Access the data – with SQL queries
Data manipulation
Insert – add new rows to your table
INSERT INTO species (species_name) VALUES ('Solanum lycopersicum');
Delete -remove rows from table
DELETE FROM species WHERE species_id = 1;
Update – modify column value/s of existing rows
UPDATE species SET species_name = 'Solanum tuberosum' WHERE species_id = 1;
Data manipulation
Transactions
BEGIN;
UPDATE.......;
DELETE.....;
INSERT .....;
COMMIT; or ROLLBACK;
Data manipulation - transactions
Data manipulation – copy command
Large dataset?
➢ Write SQL file with INSERT commands
➢ Use COPY with a delimited text file
http://www.postgresql.org/docs/8.4/interactive/app-psql.html
Data manipulation – copy command
Data manipulation – copy command
=> \copy species (species_name) FROM 'species_list.txt'
BEGIN;INSERT INTO species (species_name) VALUES ('Solanum melongena');INSERT INTO species (species_name) VALUES ('Solanum tuberosum');INSERT INTO species (species_name) VALUES ('Capsicum annuum');..COMMIT;
SQL (Structured Query Language)
Language for building, accessing, and manipulating a relational database management systems (RDBMS)
➢ Database
➢ Schema: tables, relationships, permissions (DDL)
➢ The data (INSERT, UPDATE, DELETE)
➢ Access the data – with SQL queries (SELECT)
SQL (Structured Query Language)
➢ Data definition➢ Data manipulation (DML)➢ SELECT statements
➢ Joins➢ Row functions➢ Aggregate functions➢ Subqueries➢ Views
➢ * PostgreSQL, and database connection with Perl
SELECT statements
SELECT statements
Select everything:
SELECT * FROM sample;
Count rows, group by column, with a condition:
SELECT count(sample_id) , species.species_name FROM sampleJOIN species USING (species_id)GROUP BY species_nameHAVING species_name like 'Solanum%';
Select with a condition, sort results:
SELECT sample_name FROM sampleWHERE species_id=1ORDER BY sample_name ASC;
SELECT statements
Conditional operators:
SELECT .... FORM .... WHERE ....
SQL (Structured Query Language)
➢ Data definition➢ Data manipulation (DML)➢ SELECT statements
➢ Joins➢ Row functions➢ Aggregate functions➢ Subqueries➢ Views
➢ * PostgreSQL, and database connection with Perl
Joins
Joins
➢ Inner joins are default, so the word 'inner' can be omitted
➢ Natural joins (when foreign key column has the same name as the referenced column)
SELECT * FROM sample
JOIN species USING (species_id);
Joins
Joins
Row functions
Row functions
Math functions
String functions
http://www.postgresql.org/docs/8.4/interactive/functions-math.html
http://www.postgresql.org/docs/8.4/interactive/functions-string.html
Row functions
Date and time http://www.postgresql.org/docs/8.4/interactive/functions-datetime.html
Data type conversion
# SELECT now(); --this is a timestamp!
# SELECT cast (now() AS text) ; --the output is now a 'text' data type
http://www.postgresql.org/docs/8.4/static/sql-createcast.html
PostgreSQL – the basics
$ psql -h hostname -U username -d dbname
Users: postgres, your_user_name, other_user
*user permissions –
➢ postgres is the database superuser➢ Grant permissions to other users as required
Resources
➢ http://en.wikipedia.org/wiki/SQL
➢ http://www.postgresql.org/docs/8.4/
➢ In your psql terminal:
=>\h [SQL command name]
=>\?
SQL – advanced querying
➢ Data definition➢ Data manipulation (DML)➢ SELECT statements
➢ Joins➢ Row functions➢ Aggregate functions➢ Subqueries➢ Views
➢ * PostgreSQL, and database connection with Perl
Row functions - Case
The SQL CASE expression is a generic conditional expression, similar to if/else statements in other languages:
CASE WHEN condition THEN result [WHEN ...] [ELSE result]END
SELECT * FROM test;
a--- 1 2 3
SELECT a, CASE WHEN a=1 THEN 'one' WHEN a=2 THEN 'two' ELSE 'other' END FROM test;
a | case---+------- 1 | one 2 | two 3 | other
Row functions – more conditional expressions
The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display, for example:
SELECT COALESCE(description, short_description, '(none)') ...
The NULLIF function returns a null value if value1 and value2 are equal; otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE example given above:
SELECT NULLIF(value, '(none)') ...
If value1 is (none), return a null, otherwise return value1.
Aggregate functions
Aggregate functions - grouping
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions
Aggregate functions - grouping
HAVING clause – use with aggregate functions instead of 'WHERE'
Subqueries
Subqueries
Views
CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name;
A view is a virtual table based on the results of an SQL statement
http://www.postgresql.org/docs/8.4/interactive/tutorial-views.html
postgreSQL
Programming with a procedural language Write your own postgreSQL functions!
➢ PL/pgSQL (similar to Oracle's PL/SQLhttp://en.wikipedia.org/wiki/PL_SQL )
➢ Many other languages:➢ PL/Perl http://www.postgresql.org/docs/8.1/static/plperl.html ➢ PL/Java , plPHP, PL/Python, PL/R and more...