bioinformatics at iita
TRANSCRIPT
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Andreas Gisel
IITA – Bioscience & Bioinformatics
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Bioinformatics – definition and introduction
Bioinformatics @ IITA
Bioinformatics & IITA
www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
RESU
LTSBio informatics
www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
RESU
LTS
Data Repositories
Knowledge
www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
Bioinformatics is an interdisciplinary science that develops and improves on methods of analyzing biological data and storing, retrieving, organizing, and visualizing them.
This is in order to support to solve biological problems and discover the wealth of biological information hidden in biological data.
www.iita.orgA member of CGIAR consortium
?
Biological Data
DescriptionsPictures
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences
Biological Data
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
First fully sequenced bio-sequence amino acid of insulin (51aa) 1955
First fully sequence nucleic acid tRNA (75nt) 1965
First DNA Bacteriophage (5375nt) 1977
DNA sequencing Sanger sequencing technology (1975) Pyrosequencing (Next Generation sequencing 2004)
Biological Data
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
Structures
Biological Data
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
Structures Protein RNA
Biological Data
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
Structures Protein RNA
Interactions
Biological Data
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
Structures Protein RNA
InteractionsExpressions
Biological Data
www.iita.orgA member of CGIAR consortium
Up to 600’000’000’000 (600GB) bases per experiment
Data Explosion
DescriptionsPictures
Sequences Protein RNA DNA
Structures Protein RNA
InteractionsExpressions M
icroarray
High Throughput sequencing
Up to 1 million data points per experiment
NGS(Next Generation Sequencing)
www.iita.orgA member of CGIAR consortium
DescriptionsPictures
Sequences Protein RNA DNA
Structures Protein RNA
InteractionsExpressions
Data Explosion
www.iita.orgA member of CGIAR consortium
Data Analysis – DNA/RNA sequences
Sequence without knowledge connected to it is meaningless!What to do?
Sequence similarityFinding genes and regulatory elementsFunctional analysis of genesHomologyPolymorphism
BIOINFORMATICS
www.iita.orgA member of CGIAR consortium
Data Analysis
So we need bioinformatics tools and reference data
Hardware – Computing infrastructure (CPU, RAM, Storage)
Tools – Programs that process your data
Reference data – Databases for existing data
INTERNET– connection to external Databases
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Personel
Livia Stavolone – molecular biologist
Deborah Adeyele – student (training in bioinformatics and non-coding RNA)
Toyin Abdulsalam – research fellow (bioinformatics and transcriptom analysis)
Andreas Gisel
Whole Bioscience Team
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Hardware – Computing infrastructure (CPU, RAM, Storage)
HP Blade, with: 3 blades with each 2 16-core processors (AMD Opteron Processor 6272), 384Gb RAM 2Tb attached storage (DAS)8TB attached storage (NAS)
The operating system is Ubuntu 14.04.1 LTS installed via biolinux 8.
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
Basic bioinformatics services mainly based on sequence analysis
Next Generation Sequencing data analysis pipelines including:
GBS (genotyping by sequencing) data analysis and SNP callingTranscriptomics (RNA-seq) mapping, assembly and expression profilingsmallRNA data analysis: discovery and expression profilingDNA methylation (BS-seq) data analysisDNA (shotgun) assembly and variation callingGenome annotation using different data pipelines and visualization
Customized approaches using perl and shell scripting
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
Cassava1200GB compressed sequence data (~5500 accessions) SNP matrix
5500 x ~160’000SNPsYam200GB compressed sequence data (~800 accessions) 800 x ~25’000SNPs
Raw sequencing data SNP matrix
Cornell SNP calling (TASSEL)
Broad SNP calling (GATK)
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Ismail Rabbi
Bioinformatics @ IITA
Tools – Programs that process your data
SNP matrix
Cornell
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
In-house
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
External data
In-house developed scripts
www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
Chr10
Chr1
Chr4
Chr6
Chr5
Chr2
Chr3
Chr7
Chr8
Chr18
Chr9
Chr16
Chr17
Chr15
Chr13
Chr14
Chr12
Chr11
Cassava Assembly & Annotation Version 6.1
www.iita.orgA member of CGIAR consortium
Cassava Assembly & Annotation Version 6.1
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
Gene Distribution
SNP Distribution
GBS Coverage
Heterocygosity
www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
Transcriptomics (RNA-seq) mapping, assembly and expression profiling
What is RNA-seq?
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Automated pipeline for reference supported and de novo transcriptome assembly and expression profiling
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Small RNA are short (21 -200nt) long RNA, not coding for proteins with gene regulatory effects.
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Automated pipeline for non-coding RNA classification and expression profiling.
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is another gene regulation mechanism which can be inherited.
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is another gene regulation mechanism which can be inherited.
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA (shotgun) assembly and variation callingGenome annotation using different data pipelines and visualization
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)
www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
INTERNET– connection to external Databases
Automated pipelines and strategies for big data downloads
www.iita.orgA member of CGIAR consortium
Bioinformatics & IITA
Development of Bioinformatics Capacity
IITA Projects
Involvement in planning of data production, analysis - financing of data storage and analysis
BioinformaticsBioscience
Data analysis, Data repositories, Visualization
www.iita.orgA member of CGIAR consortium
Bioinformatics & IITA
Development of Bioinformatics Capacity
In project with sequencing activities:We need to individuate the bioinformatics part
We need to take over at least a part of the bioinformatics
activities
We have the Bioscience involved in the planning of the data
production to optimize the data analysis and knowledge building
Capacity building to enforce the bioinformatics facility
www.iita.orgA member of CGIAR consortium
Thank you!
Data from:
Ranjana Bhattacharjee
Livia Stavolone
Morag Ferguson
Ismail Rabbi