tthe way to systems biology

47
Next Generation Sequencing in Systems Biology Lec1 ALI KISHK

Upload: ali-kishk

Post on 12-Jan-2017

283 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Tthe way to Systems Biology

Next Generation Sequencingin Systems Biology

Lec1ALI KISHK

Page 2: Tthe way to Systems Biology

Index Why we need Systems Biology Genetic Dilemma Puzzle of Next generation sequencing Why we need modeling Type of biological models Roles of Networks Types of Biological networks Network vs Pathway vs Biomodels Primary Database Vs Secondary Database

Page 3: Tthe way to Systems Biology

"Holy trinity of systems biology is biology, computational science and technology."

Lee Hood, Institute for Systems Biology

Page 4: Tthe way to Systems Biology

Systems biology

The apple tastes good !

Traditional biology

The hard texture of the apple does not fit in the salad!

Systems biology 101

Page 5: Tthe way to Systems Biology

What is Systems Biology?• Whole-istic approach to

understanding biology .

• It aims at system-level understanding of biology, and to understand biological systems as a system.

http://www.sysbio.de/info/background/WhatIs.shtml

Page 6: Tthe way to Systems Biology

Puzzle of Next generation sequencing

Page 7: Tthe way to Systems Biology

Types of Interactions Networks

Vidal, Cusick and Barbasi, Cell 144, 2011.

Page 8: Tthe way to Systems Biology

In neuroscience

Page 9: Tthe way to Systems Biology

In pharmacology

Page 10: Tthe way to Systems Biology

In Ecological Genomics

Forecasting Ecological Genomics: High-Tech Animal Instrumentation

Meets High-Throughput Sequencing

Page 11: Tthe way to Systems Biology

Strategies to Build a Model

Chapter 2 Modeling Approaches in Systems Biology , Including Silicon Cell Models

Page 12: Tthe way to Systems Biology

Where Do Gene Lists Come From?• Molecular profiling e.g. mRNA, protein

– Identification Gene list– Quantification Gene list + values– Ranking, Clustering (biostatistics)

• Interactions: Protein interactions, microRNA targets, transcription factor binding sites (ChIP)

• Genetic screen e.g. of knock out library• Association studies (Genome-wide)

– Single nucleotide polymorphisms (SNPs)– Copy number variants (CNVs) Other

examples?

Page 13: Tthe way to Systems Biology

What Do Gene Lists Mean?• Biological system: complex, pathway, physical interactors• Similar gene function e.g. protein kinase• Similar cell or tissue location• Chromosomal location (linkage, CNVs)

Data

Page 14: Tthe way to Systems Biology

"One of the big problems in systems biology is separating signal from the noise in the data.“

~ Lee Hood, Institute for Systems Biology

Page 15: Tthe way to Systems Biology

Before Analysis

NormalizationBackground adjustmentQuality control (garbage in, garbage out)

Use statistics that will increase signal and reduce noise specifically for your experiment

Gene list sizeMake sure your gene IDs are compatible with software

Page 16: Tthe way to Systems Biology

Biological Questions

• Step 1: What do you want to accomplish with your list (hopefully part of experiment design! )– Summarize biological processes or other aspects of gene function– Perform differential analysis – what pathways are different between

samples?– Find a controller for a process (TF, miRNA)– Find new pathways or new pathway members– Discover new gene function– Correlate with a disease or phenotype (candidate gene prioritization)– Find a drug

Page 17: Tthe way to Systems Biology

Biological Answers

*Pathway enrichment analysis: summarize and compare

*Network analysis: predict gene function, find new pathway members, identify functional modules (new pathways)

*Regulatory network analysis: find and analyze controllers1

Page 18: Tthe way to Systems Biology

Pathway enrichment analysis

Gene list from experiment:Genes down-regulated in drug-sensitive brain cancer cell lines

Pathway information:All genes known to be involved inNeurotransmitter signaling

Statistical test: are there more annotations in gene list than expected?

Hypothesis: drug sensitivity in brain cancer is related to reduced neurotransmitter signaling0

Test manypathwaysp<0.05 ?

Page 19: Tthe way to Systems Biology

Pathway Enrichment Analysis

• Gene identifiers• Pathways and other gene annotation

– Gene Ontology• Ontology Structure• Annotation

– BioMart + other sources

Page 20: Tthe way to Systems Biology

Gene and Protein Identifiers• Identifiers (IDs) are ideally unique, stable names or numbers that

help track database records– E.g. Social Insurance Number, Entrez Gene ID 41232

• Gene and protein information stored in many databases– Genes have many IDs

• Records for: Gene, DNA, RNA, Protein– Important to recognize the correct record type– E.g. Entrez Gene records don’t store sequence. They link to DNA regions,

RNA transcripts and proteins e.g. in RefSeq, which stores sequence.

Page 21: Tthe way to Systems Biology

Why Systems Biology needs Networks ,

Pathways ,

Biomodels ?

Page 22: Tthe way to Systems Biology

Which pathway does red algea choose in Global Warming ?

Which miRNA /Transcription factor can be used as a biomarker in Prostate cancer ?

How does a specific herbicide not affect my plant ?

What is my Biological question

Page 23: Tthe way to Systems Biology

Types of Biological Networks

Page 24: Tthe way to Systems Biology
Page 25: Tthe way to Systems Biology

What do we mean by pathway?

• Biological process or molecular function

• Metabolic processes• Signaling cascades• Genes are categorized

based on some criteria

Central Dogma Involvement of Gene Products

Page 26: Tthe way to Systems Biology

Gene sets (biological categories)

• Genes (sets) have something in common– On the same cytogenetic band– Coding for proteins that are part of the same

cellular component– Can be part of the same biochemical pathway– Co-expressed under certain conditions– Putative targets of same regulatory factor– ….

Page 27: Tthe way to Systems Biology

What is the Gene Ontology (GO)?• Set of biological phrases (terms) which are applied to genes:

– protein kinase– apoptosis– membrane

• Dictionary: term definitions• Ontology: A formal system for describing knowledge• www.geneontology.org

lwww.geneontology.orgJane Lomax @ EBI

Page 28: Tthe way to Systems Biology
Page 29: Tthe way to Systems Biology

Gene Ontology• Gene Ontology (GO) Consortium was established in 1998 to

developed shared, structured vocabulary (an ontology) for the annotation of molecular characteristics across different organisms.

– a collaborative effort to address the need for consistent descriptions of gene and gene products in different databases

– Original members of the consortium: SGD, FlyBase and MGD

• Two primary purposes for an ontology:1. to facilitate communication between people and

organizations2. to improve upon the interoperability between systems

Page 30: Tthe way to Systems Biology

GO structure• The ontologies are structured vocabularies in the form of directed acyclic

graphs (DAGs)• The DAG represents a network (not a tree) in which each term may be a child

of one or more than one parent • The relationships of child to parent can be of the “is a” type or the “part of”

type

telomere

chromosome

mitotic chromosomeis a

part of

Page 31: Tthe way to Systems Biology

Ontologies within GO

molecular function describing activities, such as catalytic or binding activities, at the molecular level

biological process referring to a biological

objective to which the gene product contributes cellular component referring to the place in the cell

(i.e. the location) where a gene product is found

Page 32: Tthe way to Systems Biology
Page 33: Tthe way to Systems Biology
Page 34: Tthe way to Systems Biology
Page 35: Tthe way to Systems Biology

Primary Database

Vs

Secondary Database

Page 36: Tthe way to Systems Biology

How to save time find a biological database

BIOSHARING:

A DATABASE OF BIOLOGICAL DATABASES

Page 37: Tthe way to Systems Biology

KEGG: Kyoto Encyclopedia of Genes and Genomes

About KEGG • Kyoto Encyclopedia of Genes and

Genomes (KEGG) knowledgebase was developed in 1996 consisting of genetic building blocks of genes and proteins.

• A collection of manually drawn pathway maps representing current knowledge on the molecular interaction and reaction networks

• Manually curated based on published literature

• Constructed as wiring diagrams with enzymes and proteins, processes and reactions and substrates, co-factors, intermediates, metabolites and end products

Category in KEGG• Metabolism: carbohydrates,

energy, lipid, nucleotides, amino acid, xenobiotics

• Genetic information processing

• Environmental information processing

• Cellular processes • Human diseases • Drug development: the

structure relationships

Page 38: Tthe way to Systems Biology
Page 39: Tthe way to Systems Biology
Page 40: Tthe way to Systems Biology
Page 41: Tthe way to Systems Biology

More on gene set collections

• Gene Ontology (GO)– Cellular components (CC)– Biological processes (BP)– Molecular functions (MF)

• Well curated pathway database– KEGG pathway– Biocarta– Reactome– GenMAPP– IPA pathway database

• Gene set collections– MSigDB– GAzer

Page 42: Tthe way to Systems Biology

2nd

databases in Plant Systems Biology

STRING

It is a database of protein-protein interaction withknown as well as predicted information.

http://string-db.org/

Page 43: Tthe way to Systems Biology

2nd

databases in Plant Systems Biology

DIP

The DIP database provides experimentallydetermined interactions between proteins.

http://dip.doe-mbi.ucla.edu/dip/Main.cgi

Page 44: Tthe way to Systems Biology

2nd

databases in Plant Systems Biology

-Plant Metabolic Network (PMN)

It is a database that provides information withrespect to metabolic pathway in plants.

http://www.plantcyc.org/

Page 45: Tthe way to Systems Biology

Common IdentifiersSpecies-specific

lHUGO HGNC BRCA2lMGI MGI:109337

lRGD 2219 lZFIN ZDB-GENE-060510-3

lFlyBase CG9097 lWormBase WBGene00002299 or ZK1067.1

lSGD S000002187 or YDL029WlAnnotations

lInterPro IPR015252lOMIM 600185

lPfam PF09104lGene Ontology GO:0000724

lSNPs rs28897757lExperimental Platform

lAffymetrix 208368_3p_s_atlAgilent A_23_P99452

lCodeLink GE60169lIllumina GI_4502450-S

lGeneEnsembl ENSG00000139618Entrez Gene 675

lUnigene Hs.34012

lRNA transcriptlGenBank BC026160.1

RefSeq NM_000059lEnsembl ENST00000380152

lProteinlEnsembl ENSP00000369497

RefSeq NP_000050.2UniProt BRCA2_HUMAN or A1YBP1_HUMAN

lIPI IPI00412408.1lEMBL AF309413

lPDB 1MIU

Red = Recommended

Page 46: Tthe way to Systems Biology

Identifier Mapping

• So many IDs!– Software tools recognize only a handful– May need to map from your gene list IDs to standard IDs

• Four main uses– Searching for a favorite gene name– Link to related resources– Identifier translation

• E.g. Proteins to genes, Affy ID to Entrez Gene– Merging data from different sources

• Find equivalent records

Page 47: Tthe way to Systems Biology

Thank You [email protected]

linked-in : https://eg.linkedin.com/in/ali-kishk-997423a9