bda2015 tutorial-part1-intro

56
Genomics 3.0: Big Data in Precision Medicine Asoke K Talukder, Ph.D InterpretOmics, Bangalore, India Big Data Analytics 2015 Hyderabad 16-18 December, 2015

Upload: interpretomics

Post on 16-Apr-2017

237 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Bda2015 tutorial-part1-intro

16th December 2015

Genomics 3.0: Big Data in

Precision Medicine

Asoke K Talukder, Ph.D

InterpretOmics, Bangalore, India

17th December 2009

Big Data Analytics 2015Hyderabad 16-18 December, 2015

Page 2: Bda2015 tutorial-part1-intro

16th December 2015

Acknowledgement

• BDA2015 Technical committee

• Authors & Publishers making their articles Open

Access in the Web

• Open Source Software/Foundation

• Authors of Open Source & Open Domain software

• NCBI & other open domain databases

• Wikipedia & other sites that believe in Bhikshu

Economy

2

Page 3: Bda2015 tutorial-part1-intro

16th December 2015 3

Disclaimer

• During my research for this tutorial, I have referred many text and many presentations available in the Web and obtained from various colleagues and professionals. I tried to give credit to creators of artifacts used in this presentation; however, if I have missed credit citation to the original author, that is undeliberate and unintentional. Such omissions are regretted.

Page 4: Bda2015 tutorial-part1-intro

16th December 2015

About the Speaker

• Dr. Asoke K. Talukder is a computer scientist – worked for companies like Fujitsu-ICIM, Microsoft, Oracle, Informix, Digital, Hewlett Packard, ICL, Sequoia, Northern Telecom, NEC, KredietBank, iGate, Cellnext, etc. Dr. Asoke authored/edited six books out of which two are translated in Chinese and published many peer-reviewed research papers. He is recipient of many international awards including All India Radio/Doordarshan award, ICIM Professional Excellence Award, ICL Excellence Award, IBM Solutions Excellence Award, Simagine GSMWorld Award etc. He has been listed in “Who’s Who in the World”, “Who’s Who in Science and Engineering”, and “Outstanding Scientists of 21st Century”. He did M.Sc (Physics with Biophysics Major) and Ph.D in Computer Science. He was the DaimlerChrysler Chair Professor at IIIT, Adjunct Professor, Department of CSE, NIT Warangal and Adjunct Faculty CE, NITK, Surathkal. He is Co-founder and Chief Scientific Officer of InterpretOmics the Data Sciences and Systems Biology company.

4

Page 5: Bda2015 tutorial-part1-intro

16th December 2015

Part I - Introduction

Page 6: Bda2015 tutorial-part1-intro

16th December 2015

Everyday Newspaper Headlines

6

Page 7: Bda2015 tutorial-part1-intro

16th December 2015

Structure of the Tutorial

• Introduction to Omic Sciences

• Omic Sciences Challenges

• Computational Biology

• Algorithms, & Data Mining in Biology

• Blood Biopsy – a case study

7

Page 8: Bda2015 tutorial-part1-intro

16th December 2015

Goal of this Tutorial

• This tutorial will define the role of Big Data and Data Sciences in biology and lifesciences. With the help of chemistry and physics, we have some understanding of biology. With advancement of technology, our next leap in biology is becoming possible. We need Mathematics and Computers to solve grand challenges in Biology for better understanding of life and understanding of genomics – the building block of life. This will help solve problems in life like diseases management or management of food and environment

8

Page 9: Bda2015 tutorial-part1-intro

16th December 2015

Leading causes of death (U.S., 1999)

number of % total

Rank Cause deaths deaths

1 heart disease 725,192 30.3

2 malignant neoplasm 549,192 23.0

3 cerebrovascular disease 167,366 7.0

4 chronic lower respiratory 124,181 5.2

5 accidents 97,860 4.1

6 diabetes mellitus 68,399 2.9

7 influenza, pneumonia 63,730 2.7

8 Alzheimer’s disease 44,536 1.9

9 nephritis & related 35,525 1.5

10 septicemia 30,680 1.3

11 … all other 2,391,39920.2

Source: National Vital Statistics Reports 49(11):1-87, 2001.

Classification of Disease

9

Page 10: Bda2015 tutorial-part1-intro

16th December 2015

Genomics and World Health

• “It is now believed that the information generated by genomics will, in the long-term, have major benefits for the prevention, diagnosis and management of many diseases which hitherto have been difficult or impossible to control. These include communicable and genetic diseases, together with other common killers or causes of chronic illhealth, including cardiovascular disease, cancer, diabetes, the major psychoses, dementia, rheumatic disease, asthma, and many others.”

– Genomics and World Health, Report of the Advisory Committee on Health Research, presented to Director general of WHO on 20 December 2001; Ref - Jeffrey D. Sachs, WHO, Geneva, 2002

10

Page 11: Bda2015 tutorial-part1-intro

16th December 2015

Genomics and Food Chain

• To develop high nutrient food and high yield

crop, we need to understand the genetic

structure of plants and the disease vectors.

• We also need GMO (Genetically Modified

Organisms) crops that can grow and

produce in hostile environments like drought

affected or high salineted areas

11

Page 12: Bda2015 tutorial-part1-intro

16th December 2015

Genomics and Energy

• All our energy come from fossil fuels like coal and petroleum, which has been converted from some living biological organism to fuel for millions of years

• Can we culture organisms that will reduce this cycle to few years instead of millions of years

• Can we generate bio-fuels that will be economic and commercially viable?

12

Page 13: Bda2015 tutorial-part1-intro

16th December 2015

Genomics and Environment

• Can we culture organisms that will help the

carbon cycle and reduce the CO2?

• Can we culture organisms or plants that will

desalinate the sea water and produce sweet

drinking water?

• Can we culture organisms or plans that will

clean the environment and accelerate the

bio-degradability of waste?

13

Page 14: Bda2015 tutorial-part1-intro

16th December 2015

Genetic Components of Disease

Alzheimer’s Disease

14

Page 15: Bda2015 tutorial-part1-intro

16th December 2015

Landmark Discoveries

• 1941 Genes code for single proteins

• 1944 Proof that DNA carries genetic information

• 1949 The concept of sickle cell anaemia as a “molecular disease”

• 1953 Structure of insulin determined

• 1953 Multistage mutational theory of cancer by Nordling

• 1953 Field Cancerization theory of cancer

• 1953 Structure of Neuclic Acid and DNA determined

• 1956 Monogenic disease due to a single amino acid substitution of the β-chain of haemoglobin

• 1960 The X-ray crystallographic structure of haemoglobin

• 1961 The genetic code, messenger RNA, gene regulation

• 1972 Recombinant DNA, cloning and gene isolation

• 1974 Direct demonstration of a human gene deletion

• 1975 Southern blotting*

• 1976 Proto-oncogenes

• 1977 DNA sequencing

• 1978 Human gene library

• 1979 Restriction fragment length polymorphism used for prenatal diagnosis Stop codon mutation demonstrated in human globin messenger RNA Cellular oncogenes

• 1979–81 Human genes cloned and sequenced

• 1985 “Disease genes” isolated by positional cloning Polymerase chain reaction (PCR)

• 2000 The Human Genome Project — completion of 90% draft

15

Page 16: Bda2015 tutorial-part1-intro

16th December 2015

Questions Biologists Often Ask

Biologists need answers to a number of questions

How can we get all the knowledge that are contained in a given sequence or structural data

analysis

prediction of certain properties

How can software tools help in designing drugs and cure diseases based on available data

Tools for early drug discovery process

Tools to predict and treat before they manifest

16

Page 17: Bda2015 tutorial-part1-intro

16th December 2015

Omic Sciences• Genomics – is the "basic recipe" book defining an individual’s

characteristics, or that of a population or of a living species

• Transcriptomics – is the science that studies how the "basic recipes" are translated into a final product: the proteins

• Proteomics – is the study of all proteins produced by the genome expression

• Metabolomics – is the the study of interactions between proteins and all "metabolites" (sugar, fat, biomolecules, etc.) – of a cell or a biological entity

• Physiomics – is the study of interaction with physiology

• Fluxomics – is the study of dynamic changes of molecules within a cell over time.

• Sociomics – is the study of all social and cultural ecosystems that interact with the genomes

• Epigenomics – is the influence of the environmental imprint on the "coat" that covers the genetic material in the genome

• Phenomics – is the study of phenotype

• Bibliomics – is the study of literature

17

Page 18: Bda2015 tutorial-part1-intro

16th December 2015

Genomics

• Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.

18

Page 19: Bda2015 tutorial-part1-intro

16th December 2015

Gene

• With the exception of viruses, which are intracellular parasites, living organisms are divided into two general classes. First, there are eukaryotes whose cells have a complex compartmentalized internal structure; they comprise algae, fungi, plants and animals. Second, there are prokaryotes, single-celled microorganisms with a simple internal organization, which comprise bacteria and related organisms. Genetic information is transferred from one generation to the next by subcellular structures called chromosomes. Prokaryotes usually have a single circular chromosome, while most eukaryotes have more than two and in some cases up to several hundred. For example, in humans there are 23 pairs; one of the pair is inherited from each parent. Twenty-two pairs are called autosomes and one pair are called sex chromosomes. The latter are designated X and Y; females have two X chromosomes (XX) while males have an X and Y (XY).

19

Page 20: Bda2015 tutorial-part1-intro

16th December 2015

Genetics Vs Genomics

• Genetics is Biology• Genomics is Statistical Data Mining• Genetics is Confirmatory• Genomics is Expolratory• Genetics is hypothesis driven• Genomics is hypothesis creating

20

Page 21: Bda2015 tutorial-part1-intro

16th December 2015

Genomics 3.0

• Genomics 1.0: started with the Human genome project, used by

academics and researchers to understand the disease dynamics and

the genotype phenotypic association of a living system at a time when

clinicians treat the symptom of a disease (phenotype)

• Genomics 2.0: entered the clinic and pharmaceutical companies

through translational genomics. It is used today as a tool for diagnosis

of non-communicable and genetic diseases. Clinicians use Genomics

2.0 to not just treat symptoms; but, to treat the disease

• Genomics 3.0: will deal with holistic precision medicine and will be

driven by big-data genomic analytics of the 21st Century. Genomics 3.0

will be used for asymptomatic disease onset. It will not just treat a

disease, but treat a patient and cure a disease

Page 22: Bda2015 tutorial-part1-intro

16th December 2015

Reduction Vs Integration

22

Page 23: Bda2015 tutorial-part1-intro

16th December 2015

What is a System?

• A system is a whoesome entity made out of set of interacting or

interdependent components forming an integrated whole object

• It can be collection of a set of elements (often called

'components') and relationships which are different from

relationships of the set or its elements to other elements or sets

• Interdependent components may have some property or even

cannot exibit any property outside the wholesome object

• These components when combined, it becomes a wholesome

system with a static and dynamic property completely different

from the properties of individual components

23

Page 24: Bda2015 tutorial-part1-intro

16th December 2015

Systems Biology

• Systems Biology Is about integration of modeling, simulation, experimentation, databases, and bioinformatic approaches

• Predictive understanding of microbial and plant systems for advancing for clinical medicine, high yield crops, hight nutriant produce, biofuel, biological sontrol on carbon-cycling, cleaning up contaminated environment etc.

• integration of modeling, simulation, experimentation, and bioinformatic approaches

24

Page 25: Bda2015 tutorial-part1-intro

16th December 2015

The Synergy

Genomics

Transcriptomics

Proteomics

Metabolomics

Fluxomics

Sociomics

Epigenomics

Systems Biology

........

Bibliomics

25

Page 26: Bda2015 tutorial-part1-intro

16th December 2015

Model

• Scientific modelling is an activity to make a particular function or entity of the real world easier to define, quantify, visualize, understand, or simulate by referencing it to existing and usually commonly accepted knowledge

• A simulator should be able to model the actual system in Reduced or Enlarged Space & Time

• Key issues in simulation include representation of the true characteristics, function, and behaviours of the original system in a space that can be manipulated or changed as desired

• However, in many cases the similarity is only approximate or even intentionally distorted.

26

Page 27: Bda2015 tutorial-part1-intro

16th December 2015

Biological System

27

Page 28: Bda2015 tutorial-part1-intro

16th December 2015

Ways To Study A System*

28

Page 29: Bda2015 tutorial-part1-intro

16th December 2015

Deductive and Inductive Science

Ref: Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology, Springer, 2003

Physical Science

Law of Gravitation,

Newton's Law of Motion

E = mC2

Chemical/Molecular Properties

Statistics

Biological Phenomenon

Simulation (Model fitting)

Wireless Mobile Communication

Clinical Trial

29

Page 30: Bda2015 tutorial-part1-intro

16th December 2015

Technical Attractions of

Simulation• Ability to compress time, expand time

• Ability to control sources of variation

• Avoids errors in measurement

• Ability to stop and review

• Ability to restore system state

• Facilitates replication

• Modeler can control level of detail

Discrete-Event Simulation: Modeling, Programming, and Analysis by G. Fishman, 2001

30

Page 31: Bda2015 tutorial-part1-intro

16th December 2015

Simulation System

31

Page 32: Bda2015 tutorial-part1-intro

16th December 2015

Part II – Some Biology

Page 33: Bda2015 tutorial-part1-intro

16th December 2015

Will impact the health care system significantly:• Pharmaceuticals

• Biotechnology

• Healthcare industry

• Health insurance

• Medicine--diagnostics, therapy, prevention, wellness

• Nutrition

• Assessments of environmental toxicities

• Academia and medical schools

Precision Medicine Will Transform

the Health Care Industry

Healthcare

System

New ideas need new

organizational structures

33

Page 34: Bda2015 tutorial-part1-intro

16th December 2015

Instruments to Decipher Various

Types of Biological Information

34

Page 35: Bda2015 tutorial-part1-intro

16th December 2015

Protein interactions: Yeast two-hybrid method

35

Page 36: Bda2015 tutorial-part1-intro

16th December 2015

• Based on X-Ray data from Rosliand Franklin, recognized that the 3.4

Angstrom period suggested a double helix.

• Based on Chargaff’s rule ([A]=[T] and [C]=[G]), recognized that the

two strands must be held together by H-bonds between purine and

pyrimidine pairs.

• Accepted the assumption that nucleotides were held together by

phosphodiester bonds with phosphate as the chain backbone.

Watson-Crick Model of DNA

36

Page 37: Bda2015 tutorial-part1-intro

16th December 2015

• James D. Watson and Francis

Crick who, using x-ray data

collected by Rosalind Franklin,

proposed the double helix

structure of the DNA molecule in

1953. Their article, Molecular

Structure of Nucleic Acids: A

Structure for Deoxyribose

Nucleic Acid, is celebrated for its

treatment of the B form of DNA

(B-DNA), and as the source of

Watson-Crick base pairing of

nucleotides. They with Maurice

Wilkins, were awarded the Nobel

Prize in Physiology or Medicine

in 1962.

Watson & Crick

37

Page 38: Bda2015 tutorial-part1-intro

16th December 2015

The Journal Article that Won the Nobel Prize

38

Page 39: Bda2015 tutorial-part1-intro

16th December 2015

Interactions within a Cell

Animal Plant

Nucleus

Ribosome

Endoplasmic Reticulum

Golgi Body

Ribosome: site where proteins are made

39

Page 40: Bda2015 tutorial-part1-intro

16th December 2015

Nucleus

Chromosome

DNA

Nucleic Acid

Nucleotide

Inside the Nucleus

40

Page 41: Bda2015 tutorial-part1-intro

16th December 2015

Nucleic Acids

• Deoxyribonucleic acid (DNA)

– DNA is found in the nucleus with small amounts

in mitochondria and chloroplasts

• Ribonucleic acid (RNA)

– RNA is found throughout the cell

© 2007 Paul Billiet ODWS41

Page 42: Bda2015 tutorial-part1-intro

16th December 2015

Watson-Crick Model of DNA

• Chains were in an antiparallel orientation

• Bases stacked perpendicular to helix axis and associate through hydrogen bonds

• Each turn is 34 Angstroms = 10 bases/turn

• Major and minor grooves within the helix

• Double helix has a 20 Angstrom diameter

42

Page 43: Bda2015 tutorial-part1-intro

16th December 2015

ADDING IN THE

BASES

• The bases are

attached to the 1st

Carbon

• Their order is

important

It determines the

genetic information

of the molecule

P

P

P

P

P

P

G

C

C

A

T

T© 2007 Paul Billiet ODWS

43

Page 44: Bda2015 tutorial-part1-intro

16th December 2015

Nucleotide Base Pairing

Nucleotides pair by forming H-bonds between bases. The

pairing is the basis for the antiparallel strands associating with

each other.

44

Page 45: Bda2015 tutorial-part1-intro

16th December 2015

3’

3’ 5’

5’

Single Stranded DNADouble Stranded DNA

45

Page 46: Bda2015 tutorial-part1-intro

16th December 2015

Proteins play key roles in a living

system

• Three examples of protein functions

– Catalysis:Almost all chemical reactions in a living cell are catalyzed by protein enzymes.

– Transport:Some proteins transports various substances, such as oxygen, ions, and so on.

– Information transfer:For example, hormones.

Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones

Haemoglobin carries oxygen

Insulin controls the amount of sugar in the blood

46

Page 47: Bda2015 tutorial-part1-intro

16th December 2015

Amino acid: Basic unit of protein

COO-NH3+ C

R

HAn amino acid

Different side chains, R, determin the properties of 20 amino acids.

Amino group Carboxylic acid group

47

Page 48: Bda2015 tutorial-part1-intro

16th December 2015

Proteins are linear polymers of

amino acids

R1

NH3+ C CO

H

R2

NH C CO

H

R3

NH C CO

H

R2

NH3+ C COO

H

R1

NH3+ C COO

H

H2OH2O

Peptide bond

Peptide bond

The amino acid sequence is called as

primary structureA A

FNG

GS

T

S

DK

A carboxylic acid condenses with an amino group with the release of a water

48

Page 49: Bda2015 tutorial-part1-intro

16th December 2015

Gene is protein’s blueprint,

genome is life’s blueprint

Gene

GenomeDNA

Protein

Gene Gene

Gene

Gene

GeneGene

GeneGene

GeneGeneGeneGene

GeneGene

Protein Protein

ProteinProtein

Protein

ProteinProtein

Protein

Protein

Protein

Protein

Protein

Protein

Protein

49

Page 50: Bda2015 tutorial-part1-intro

16th December 2015

Gene is protein’s blueprint,

Genome is life’s blueprint

Genome

Gene Gene

Gene

Gene

GeneGene

GeneGene

GeneGeneGeneGene

GeneGene

Protein Protein

ProteinProtein

Protein

ProteinProtein

Protein

Protein

Protein

Protein

Protein

Protein

Protein

Glycolysis network

50

Page 51: Bda2015 tutorial-part1-intro

16th December 2015

Amino acid sequence is

encoded by DNA base sequence

in a gene

Th

ird le

tter

G

A

C

T

G

A

C

T

G

A

C

T

G

A

C

T

Gly

Arg

Ser

Arg

Trp

Stop

Cys

GACT

GGGGAGGCGGTG

GGAGlu

GAAGCAGTA

GGCGACGCCGTC

GGTAsp

GAT

Ala

GCT

Val

GTT

G

AGGAAGACGMetATG

AGALys

AAAACAATA

AGCAACACCATC

AGTAsn

AAT

Thr

ACT

Ile

ATT

A

CGGCAGCCGCTG

CGAGln

CAACCACTA

CGCCACCCCCTC

CGTHis

CAT

Pro

CCT

Leu

CTT

C

TGGTAGTCGTTG

TGAStop

TAATCALeu

TTA

TGCTACTCCTTC

TGTTyr

TAT

Ser

TCTPhe

TTT

T

Firs

t lette

r

Second letter

51

Page 52: Bda2015 tutorial-part1-intro

16th December 2015

Our life is maintained by

molecular network systems

Molecular network system in a cell

(From ExPASy Biochemical Pathways; http://www.expasy.org/cgi-bin/show_thumbnails.pl?2)

52

Page 53: Bda2015 tutorial-part1-intro

16th December 2015

So how can we meaningfully

integrate the data?

53

Page 54: Bda2015 tutorial-part1-intro

16th December 2015

protein-gene

interactions

protein-protein

interactions

PROTEOME

GENOME

METABOLISM

Bio-chemical

reactions

Citrate Cycle

Cellular networks:

GENES

54

Page 55: Bda2015 tutorial-part1-intro

16th December 2015

A Real-life System - Reactome

55

Page 56: Bda2015 tutorial-part1-intro

16th December 2015

End of Part I & II

InterpretOmicsOffice: Shezan Lavelle, 5th Floor,

#15 Walton Road, Bengaluru 560001

Lab: #329, 7th Main, HAL 2nd Stage,

Indiranagar, Bengaluru 560008

Phone: +91(80)46623800