introductory to database handling endre sebestyén

29
Introductory to database handling Endre Sebestyén

Upload: ella-jones

Post on 26-Mar-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introductory to database handling Endre Sebestyén

Introductory to database handling

Endre Sebestyén

Page 2: Introductory to database handling Endre Sebestyén

What is a database? A database is a bunch of information

It is a structured collection of information It contains basic objects, called records or entries The records contain fields, which contain defined

types of data, somehow related to that record

A nuclotid sequence database would contain for example all kinds of nucleotides as records, and nucleotide properties (length, name, origin, etc) as fields.

Page 3: Introductory to database handling Endre Sebestyén

What is a database? A database is searchable

It contains an index (table of content, catalog) It is updated regularly (releases)

New data goes in Obsolete, old data goes out

It is cross referenced To other databases

Page 4: Introductory to database handling Endre Sebestyén

Why databases? The main purpose of databases is not only to

collect and organize data, but to allow advanced data retrieval and analysis

A database query is a method to retrieve information from the database

The organization of records into fields allows us to use queries on fields

Example : all mouse rna sequences between 1000-1500 bp length

Page 5: Introductory to database handling Endre Sebestyén

Databases on the internet

USER

WEBSERVERSDATABASE SERVER

Page 6: Introductory to database handling Endre Sebestyén

Databases on the internet

Page 7: Introductory to database handling Endre Sebestyén

Databases on the internet Book Book title Sequence Temperature Picture Video Log files of web

servers etc

Page 8: Introductory to database handling Endre Sebestyén

Databases on the internet Bookshelves Boxes Text files/directories Binary files MySQL database Oracle database

Page 9: Introductory to database handling Endre Sebestyén

Types of databases Hierarchical model

Tree-like structures Parent -> child One to many relations

Page 10: Introductory to database handling Endre Sebestyén

Types of databases Network model

More complex than the previous Parent -> child One to many Many to one

Page 11: Introductory to database handling Endre Sebestyén

Types of databases Relational model

Most widely used Fast and efficient (if the data structure is designed

correctly)

Page 12: Introductory to database handling Endre Sebestyén

Databases on the internet Lists Catalogues Librarian Index files SQL language grep command

Page 13: Introductory to database handling Endre Sebestyén

Query systems for databases SQL query language

Querying and modifying data Managing the database

Optimize queries

SELECT * FROM sequence_feature WHERE sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10;

Multiple operating systems Different programming languages Different storage systems (MySQL, PostgreSQL, etc)

Use SQL terminal Throught programming languages

Page 14: Introductory to database handling Endre Sebestyén

Databases on the internet Library NCBI Entrez Google Lots of other general

and specialized databases with search interfaces on the web

Page 15: Introductory to database handling Endre Sebestyén

Case study: the DoOP database Tries to collect and analyze the promoter regions

of different genes and orthologous gene clusters http://doop.abc.hu

2 main sections: plant and chordate Chordate: v1.4 Plant: v1.5, v1.6

Integrates different kinds of data Sequence data Sequence annotation

Cross-references to external databases Multiple alignments Conserved sequence regions

Goal: easily accessible and searchable interface on the web

Page 16: Introductory to database handling Endre Sebestyén

Data processing

Page 17: Introductory to database handling Endre Sebestyén

MySQL tables

Page 18: Introductory to database handling Endre Sebestyén

MySQL tables

Page 19: Introductory to database handling Endre Sebestyén

MySQL table

Page 20: Introductory to database handling Endre Sebestyén

MySQL tables

Page 21: Introductory to database handling Endre Sebestyén

Data processing

Page 22: Introductory to database handling Endre Sebestyén

API for the MySQL database Application Programming Interface

We want to convert the MySQL data into nice webpages

MySQL query to get data: SELECT * FROM sequence_feature WHERE

sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10;

And so on… Process the data

OR with n API $data = $sequence_feature_object->get_data;

Page 23: Introductory to database handling Endre Sebestyén

Bio::DOOP API (More or less) simple representations of the

sequence and other data -> modules and objects The API “hides” the MySQL queries and other stuff

from us, so we can concentrate on the web pages It works well only if we have good API design with

all the necessary features

Bio::DOOP API modules Clusters Subsets Sequences Sequence features Motifs Other modules for managing, sorting and filtering the

data

Page 24: Introductory to database handling Endre Sebestyén

Search page

Search types Sequence ID Gene ID Keywords Species Sequence

Page 25: Introductory to database handling Endre Sebestyén

Search results

Cluster ID Description Conserved motifs Taxonomical groups

Download sequences

Page 26: Introductory to database handling Endre Sebestyén

Promoter cluster

Sequences

Gene annotation

Sequence alignment

Crossreferences

Conserved regions

Page 27: Introductory to database handling Endre Sebestyén

Promoter cluster

UTR region

Species, size

Motifs

Page 28: Introductory to database handling Endre Sebestyén

Motifs

Further search in the motif collection

Similar table as in the previous search results

Page 29: Introductory to database handling Endre Sebestyén

Thank you for your attention!