designing biological databases

32
How do you solve a problem like a biological database? (BNF 216 - Database Modeling and Design for Bioinformatics) Arjei Balandra Software Developer National Telehealth Center University of the Philippines Manila http://bumblebest.net

Upload: arjei-balandra

Post on 17-Jul-2015

53 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Designing Biological Databases

How do you solve a problem like a biological database?

(BNF 216 - Database Modeling and Design for Bioinformatics)

Arjei Balandra Software Developer

National Telehealth Center University of the Philippines – Manila

http://bumblebest.net

Page 2: Designing Biological Databases

Database

• A database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information.

– The Linux Information Project

(http://www.linfo.org/database.html)

Page 3: Designing Biological Databases

Biological Database

• Biological databases are libraries of life sciences information collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses.

- Wikipedia (en.wikipedia.org/wiki/Biological_database)

Page 4: Designing Biological Databases

NCBI - GenBank

Page 5: Designing Biological Databases

European Nucleotide Archive – EMBL-EBI

Page 6: Designing Biological Databases

DDBJ – DNA Data Bank Of Japan

Page 7: Designing Biological Databases

Why Database?

• Data-intensive techniques such as high-throughput screening and gene expression experiments demand methods to correlate large and diverse datasets.

• Databases integrate information from a variety of sources allowing faster and more powerful searches.

Page 8: Designing Biological Databases

DO A “GOOD” DATABASE DESIGN

Tip #1:

Page 9: Designing Biological Databases

Good Database Design

• Provides easy access to previous results.

• Supports both expert- and machine-guided searches for novel correlations in data.

Page 10: Designing Biological Databases

Bad Database Design

• Obfuscates the correlations for which the user is searching

• makes it difficult for biologists to fit their data into the database or to find previously stored data resulting to user contempt.

• ‘brittle’

Page 11: Designing Biological Databases

LEARN FROM EXISTING LITERATURE

Tip #2:

Page 12: Designing Biological Databases

• Generalizations

• Incorporate existing schema into the database design

• Use existing structures for common data

Page 13: Designing Biological Databases

Generalizations

Page 14: Designing Biological Databases

aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)

Page 15: Designing Biological Databases

RESPECT THE UNIQUE NEEDS OF BIOLOGISTS (AND USERS)

Tip #3:

Page 16: Designing Biological Databases

Business rules

• constraints

– based on data derived from the real-world entities

– specific to the needs of the organization.

Page 17: Designing Biological Databases

What they need?

– Use free-text Comments

– Create user-specific categories

Dealing with Business Rules

Page 18: Designing Biological Databases

User-Specific Categories

Page 19: Designing Biological Databases

DESIGN THE DATABASE BEFORE BUILDING IT

Tip #4:

Page 20: Designing Biological Databases

USE THE DATABASE TO ENFORCE DATA INTEGRITY

Tip #5:

Page 21: Designing Biological Databases

Normalization

Page 22: Designing Biological Databases

Normalization

Page 23: Designing Biological Databases

Normalization

Page 24: Designing Biological Databases

KEEP THE DATABASE SCOPE MANAGEABLE

Tip #6:

Page 25: Designing Biological Databases

• In Biology, one size does not fit all

• Focus on a subset of Biology (ie. Genes, Proteins)

• In large subsets, do it one at a time

• Inclusive

Keep the database scope manageable

Page 26: Designing Biological Databases

LISTEN TO THE PEOPLE WHO HAVE TO WRITE AND USE THE INTERFACE

Tip #7:

Page 27: Designing Biological Databases

• Databases are successful only when people use it

Users know what they want and need

+ Developers know what they can do

+ Designers know what must be done ---------------------------------------------------------

= Collaborative approach to develop a successful database

Page 28: Designing Biological Databases

TEST THE DESIGN WITH REALISTIC DATA

Tip #8:

Page 29: Designing Biological Databases

MAKE THE DATABASE STRUCTURE UNDERSTANDABLE AND EASY TO MAINTAIN

Tip #9:

Page 30: Designing Biological Databases
Page 31: Designing Biological Databases

THANK YOU!

REPLACE(quote,”pagmamahal”,”

data”);

quote

Page 32: Designing Biological Databases

References

• The Linux Information Project (http://www.linfo.org/database.html)

• Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing databases to store biological information. BIOSILICO Vol. 1, No. 4

• Wikipedia (en.wikipedia.org/wiki/Biological_database) • Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,

X., Janky, R., … Wodak, S. J. (2004). The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32(Database issue), D443–D448. doi:10.1093/nar/gkh139