presentation2013

14
EuMicrobeDB-M: A light weight microbial Genome Database based on Mysql with C++ API Sucheta Tripathy, Akash Gupta, Brett M Tyler This is work in progress…

Upload: sucheta-tripathy

Post on 13-Jun-2015

232 views

Category:

Education


5 download

DESCRIPTION

Database, genomics

TRANSCRIPT

Page 1: Presentation2013

EuMicrobeDB-M: A light weight microbial

Genome Database based on Mysql with C++ API

Sucheta Tripathy, Akash Gupta, Brett M Tyler

This is work in progress…

Page 2: Presentation2013

Yet Again Another Database????????? Its raining sequences!!!!!!!!!!

EuMicrobedb-M

Sequencing is outrunning the ability to store, transmit and analyze the data - NYtimes

Page 3: Presentation2013

BackgroundFungiDB

VMD ->EuMicrobedb.org

Transcriptomicsdb

Page 4: Presentation2013

Eumicrobedb◦ Based on Oracle and GUS◦ Administered at the Virginia Tech

Background

Page 5: Presentation2013

Based on Mysql Front end remains the same (Based on perl

CGI, GD and PHP) Name spaces are downsized. Number of tables/views downsized. Removed dependencies.

Lightweight Eumicrobedb.org

Page 6: Presentation2013

Lightweight Eumicrobedb.org

5 name spaces(179+39+40+15+56) tables(84+4+15+24) viewsNeeds Oracle licenseNeeds Bioperl

3 Name spaces(20+7+10) tables(18) ViewsIndependent of oracle Independent of Bioperl

Transcriptomics database

Page 7: Presentation2013

FeaturesEuMicrobedb-Oracle

Eumicrobedb-Light

Total Number of Tables

329 37

Total Number of views

127 18

Query time 10secs 1.2 secs

Time for genome upload

12-14 hours 2 hours

Page 8: Presentation2013

Annotation

Sequence fasta

GFF

C++ API

Database

Toolkit

P. sojae V1.0P. sojae V5.0P. ramorum V1.0H. arabidopsidis V8.3

Page 9: Presentation2013

Future Plan

Annotation

Sequence fasta

GFF

Database

Toolkit

C++API

Genome tools

Page 10: Presentation2013

3 Dell Power Edge R420 servers: 16 GB RAM, 1.5 TB each with NFS.◦ Data Analysis server◦ Web server◦ Data storage

R820 server: 128 GB RAM, 16 TB storage.

IICB has 2 compute clusters with 64 nodes and each node having 192 GB memory.

CSIR-cMMACS has India’s fastest supercomputer. One sequencing and one bioinformatics support.

Computational facilities

Page 11: Presentation2013

Labs sequencing genomes at rapid pace◦ Draft Assembly◦ Data not yet in genbank◦ Gff annotation available : View on browser

Labs with limited hosting facilities◦ Data hosting◦ Data analysis

Who are the users

Page 12: Presentation2013

Source will be released soon for people to replicate the database.

◦ No Oracle license. $$$◦ Independent of many packages.◦ Installation time reduced.◦ Simple user experience.

Package Release Policy

Page 13: Presentation2013

Prof. Brett Tyler

Prof. Siddhrtha Roy

Page 14: Presentation2013

Thank You