introductory to database handling endre sebestyén
TRANSCRIPT
Introductory to database handling
Endre Sebestyén
What is a database? A database is a bunch of information
It is a structured collection of information It contains basic objects, called records or entries The records contain fields, which contain defined
types of data, somehow related to that record
A nuclotid sequence database would contain for example all kinds of nucleotides as records, and nucleotide properties (length, name, origin, etc) as fields.
What is a database? A database is searchable
It contains an index (table of content, catalog) It is updated regularly (releases)
New data goes in Obsolete, old data goes out
It is cross referenced To other databases
Why databases? The main purpose of databases is not only to
collect and organize data, but to allow advanced data retrieval and analysis
A database query is a method to retrieve information from the database
The organization of records into fields allows us to use queries on fields
Example : all mouse rna sequences between 1000-1500 bp length
Databases on the internet
USER
WEBSERVERSDATABASE SERVER
Databases on the internet
Databases on the internet Book Book title Sequence Temperature Picture Video Log files of web
servers etc
Databases on the internet Bookshelves Boxes Text files/directories Binary files MySQL database Oracle database
Types of databases Hierarchical model
Tree-like structures Parent -> child One to many relations
Types of databases Network model
More complex than the previous Parent -> child One to many Many to one
Types of databases Relational model
Most widely used Fast and efficient (if the data structure is designed
correctly)
Databases on the internet Lists Catalogues Librarian Index files SQL language grep command
Query systems for databases SQL query language
Querying and modifying data Managing the database
Optimize queries
SELECT * FROM sequence_feature WHERE sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10;
Multiple operating systems Different programming languages Different storage systems (MySQL, PostgreSQL, etc)
Use SQL terminal Throught programming languages
Databases on the internet Library NCBI Entrez Google Lots of other general
and specialized databases with search interfaces on the web
Case study: the DoOP database Tries to collect and analyze the promoter regions
of different genes and orthologous gene clusters http://doop.abc.hu
2 main sections: plant and chordate Chordate: v1.4 Plant: v1.5, v1.6
Integrates different kinds of data Sequence data Sequence annotation
Cross-references to external databases Multiple alignments Conserved sequence regions
Goal: easily accessible and searchable interface on the web
Data processing
MySQL tables
MySQL tables
MySQL table
MySQL tables
Data processing
API for the MySQL database Application Programming Interface
We want to convert the MySQL data into nice webpages
MySQL query to get data: SELECT * FROM sequence_feature WHERE
sequence_primary_id LIKE ‘%$variable%’ SORT BY sequence_primary_id LIMIT 10;
And so on… Process the data
OR with n API $data = $sequence_feature_object->get_data;
Bio::DOOP API (More or less) simple representations of the
sequence and other data -> modules and objects The API “hides” the MySQL queries and other stuff
from us, so we can concentrate on the web pages It works well only if we have good API design with
all the necessary features
Bio::DOOP API modules Clusters Subsets Sequences Sequence features Motifs Other modules for managing, sorting and filtering the
data
Search page
Search types Sequence ID Gene ID Keywords Species Sequence
Search results
Cluster ID Description Conserved motifs Taxonomical groups
Download sequences
Promoter cluster
Sequences
Gene annotation
Sequence alignment
Crossreferences
Conserved regions
Promoter cluster
UTR region
Species, size
Motifs
Motifs
Further search in the motif collection
Similar table as in the previous search results
Thank you for your attention!