integrated computational approach for translational biomedical research seungchan kim, ph.d. cse,...

45
Integrated Computational Integrated Computational Approach for Approach for Translational Biomedical Translational Biomedical Research Research Seungchan Kim, Ph.D. Seungchan Kim, Ph.D. CSE, Arizona State University CSE, Arizona State University and and MDTV/GenSIP, Translational Genomics Research MDTV/GenSIP, Translational Genomics Research Institute Institute AI @ ASU Lunch Bunch AI @ ASU Lunch Bunch Oct. 25, 2005 Oct. 25, 2005 BY 510 BY 510

Post on 20-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

Integrated Computational Integrated Computational Approach for Approach for

Translational Biomedical Translational Biomedical ResearchResearch

Seungchan Kim, Ph.D.Seungchan Kim, Ph.D.

CSE, Arizona State UniversityCSE, Arizona State Universityand and

MDTV/GenSIP, Translational Genomics Research MDTV/GenSIP, Translational Genomics Research InstituteInstitute

AI @ ASU Lunch BunchAI @ ASU Lunch BunchOct. 25, 2005Oct. 25, 2005

BY 510BY 510

Page 2: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Biomedical ProblemsBiomedical Problems

• Can we recognize disease Can we recognize disease subtypessubtypes??• Can we identify Can we identify molecular markersmolecular markers for for

certain type of disease?certain type of disease?• Can we learn Can we learn regulatory mechanismregulatory mechanism

governing cellular phenotype, i.e. governing cellular phenotype, i.e. disease?disease?

• Can we find a new Can we find a new therapeutic targettherapeutic target for the treatment of disease?for the treatment of disease?

• Etc.…Etc.…

Page 3: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Cells: Basic FeaturesCells: Basic Features

• All living things are made of All living things are made of cellscells..• All cells share the same machinery for All cells share the same machinery for

their most basic functions.their most basic functions.• All cells store their hereditary All cells store their hereditary

information in the same linear chemical information in the same linear chemical code, stored in a double-stranded code, stored in a double-stranded molecule, the molecule, the deoxyribonucleic acid deoxyribonucleic acid (DNA).(DNA).

• All cells replicate their hereditary All cells replicate their hereditary information by information by templated polymerizationtemplated polymerization..

Page 4: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Cells: Basic FeaturesCells: Basic Features

• All cells All cells transcribetranscribe portions of their portions of their hereditary information into single stranded hereditary information into single stranded molecules known as ribonucleic acids (RNA).molecules known as ribonucleic acids (RNA).

• All cells All cells translatetranslate RNA into protein (long RNA into protein (long polymer chains) in the same way.polymer chains) in the same way.

• All cells use proteins to catalyze most All cells use proteins to catalyze most chemical reactions.chemical reactions.

• All cells function as biochemical factories All cells function as biochemical factories dealing with the same basic molecular dealing with the same basic molecular building blocks.building blocks.

Page 5: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Prokaryotic v. EukaryoticProkaryotic v. Eukaryotic

• Living organisms can be classified on Living organisms can be classified on the basis of cell structure into two the basis of cell structure into two groups:groups:– Eukaryotes Eukaryotes (plants, fungi, and animals)(plants, fungi, and animals)– Prokaryotes Prokaryotes (bacteria)(bacteria)

• Eukaryotes keep their DNA in a distinct Eukaryotes keep their DNA in a distinct membrane-bounded intracellular membrane-bounded intracellular compartment called the compartment called the nucleusnucleus..

• Prokaryotes have no distinct nuclear Prokaryotes have no distinct nuclear compartment to house their DNA.compartment to house their DNA.

Page 6: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

A Typical Prokaryotic CellA Typical Prokaryotic Cell

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 7: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

A Typical Eukaryotic CellA Typical Eukaryotic Cell

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 8: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

A “Simplified” CellA “Simplified” Cell• The The membrane membrane is the lipid bi-is the lipid bi-

layer and associated proteins layer and associated proteins that encloses all cells.that encloses all cells.

• The The nucleus nucleus is a prominent is a prominent membrane-bounded organelle membrane-bounded organelle in a eukaryotic cell, in a eukaryotic cell, containing DNA organized containing DNA organized into into chromosomeschromosomes..

• The The nuclear envelop nuclear envelop is a double is a double membrane surrounding the membrane surrounding the nucleus. It consists of an outer nucleus. It consists of an outer and inner membrane and is and inner membrane and is perforated by nuclear pores.perforated by nuclear pores.

• The The chromatin chromatin is the complex of is the complex of DNA and various proteins that DNA and various proteins that are found in the nucleus of a are found in the nucleus of a eukaryotic cell. It is the material eukaryotic cell. It is the material that chromosomes are made of.that chromosomes are made of.

• The The cytoplasm cytoplasm is the contents of is the contents of the cell that are contained the cell that are contained within its plasma membrane within its plasma membrane but, in the case of eukaryotic but, in the case of eukaryotic cells, outside the nucleus.cells, outside the nucleus.

• The The ribosomes ribosomes are particles are particles composed of ribosomal RNAs composed of ribosomal RNAs and ribosomal proteins that and ribosomal proteins that associate with associate with messenger RNAs messenger RNAs and catalyze the synthesis of and catalyze the synthesis of protein.protein.

nucleus

chromatin

ribosomes

membrane

nuclear envelope

Page 9: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA and its Building DNA and its Building BlocksBlocks

• DNA is made from DNA is made from simple subunits, called simple subunits, called nucleotidesnucleotides, each , each consisting of a sugar consisting of a sugar phosphate molecule phosphate molecule with a nitrogen-with a nitrogen-containing side-group, containing side-group, or or basebase, attached to it., attached to it.

• The bases are of four The bases are of four types:types:– Adenine Adenine (A)(A)– Guanine Guanine (G)(G)– Cytosine Cytosine (C)(C)– Thymine Thymine (T)(T)

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 10: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA and its Building DNA and its Building BlocksBlocks

• A single strand of DNA consists of nucleotides A single strand of DNA consists of nucleotides joined together by sugar-phosphate linkages.joined together by sugar-phosphate linkages.

• The individual sugar-phosphate units are The individual sugar-phosphate units are asymmetric, giving the backbone of the strand a asymmetric, giving the backbone of the strand a definite directionality or polarity.definite directionality or polarity.

• This directionality guides the molecular This directionality guides the molecular processes by which the information in DNA is processes by which the information in DNA is interpreted and copied in cells.interpreted and copied in cells.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 11: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA and its Building DNA and its Building BlocksBlocks

• Through Through templated templated polymerizationpolymerization, the , the sequence of nucleotides in sequence of nucleotides in an existing DNA strand an existing DNA strand controls the sequence in controls the sequence in which nucleotides are which nucleotides are joined together in a new joined together in a new DNA strand.DNA strand.

• RulesRules: {A : {A T} | {C T} | {C G} G}• The new strand has a The new strand has a

nucleotide sequence that is nucleotide sequence that is complementary to that of complementary to that of the old strand, and a the old strand, and a backbone with opposite backbone with opposite directionality.directionality.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 12: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA and its Building DNA and its Building BlocksBlocks

• A normal DNA molecule consists of two A normal DNA molecule consists of two complementary strands.complementary strands.

• The nucleotides within each strand are linked by The nucleotides within each strand are linked by strong (strong (covalentcovalent) chemical bonds.) chemical bonds.

• The complementary nucleotides on opposing strands The complementary nucleotides on opposing strands are held together more weakly, by are held together more weakly, by hydrogen hydrogen bonds.bonds.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 13: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA and its Building DNA and its Building BlocksBlocks

• The two strands The two strands twist around each twist around each other to form a other to form a double helixdouble helix..

• This is a robust This is a robust structure that can structure that can accommodate accommodate any any sequence of sequence of nucleotides nucleotides without altering its without altering its basic structure.basic structure.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 14: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA ReplicationDNA Replication

• During the process of DNA replication, the two During the process of DNA replication, the two strands of DNA double helix are pull apart.strands of DNA double helix are pull apart.

• Each strand serves as a template for synthesis Each strand serves as a template for synthesis of a new complementary strand by means of of a new complementary strand by means of templated polymerization.templated polymerization.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 15: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA TranscriptionDNA Transcription

• Each cell contains a Each cell contains a fixed fixed set of DNA molecules.set of DNA molecules.• A given segment of DNA serves to guide the synthesis A given segment of DNA serves to guide the synthesis

of many of many identical identical RNA transcripts.RNA transcripts.• These transcripts serve as These transcripts serve as working copies working copies of the of the

information stored in the DNA archive.information stored in the DNA archive.• Many different sets of RNA molecules can be made by Many different sets of RNA molecules can be made by

transcribing selected parts of a long DNA sequence, transcribing selected parts of a long DNA sequence, allowing each cell to use its stored information allowing each cell to use its stored information differently.differently.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 16: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

DNA TranscriptionDNA Transcription

• All RNA in a cell is made by the process of All RNA in a cell is made by the process of DNA DNA transcriptiontranscription..

• DNA transcription is similar to DNA replication.DNA transcription is similar to DNA replication.• It produces a single-stranded RNA molecule that It produces a single-stranded RNA molecule that

is complementary to one strand of DNA.is complementary to one strand of DNA.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 17: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

TranslationTranslation

• During translation, the During translation, the RNA molecules RNA molecules produced from produced from transcription are used transcription are used to guide the synthesis to guide the synthesis of molecules of of molecules of proteins.proteins.

• Proteins Proteins are long are long polymer chains formed polymer chains formed by stringing together by stringing together monomeric building monomeric building blocks (blocks (amino acidsamino acids) ) drawn from a standard drawn from a standard repertoire that is the repertoire that is the same for all living same for all living cells.cells.

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 18: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

TranslationTranslation

• There are only four different nucleotides in There are only four different nucleotides in mRNA and twenty different types of amino mRNA and twenty different types of amino acids in a protein.acids in a protein.

• Therefore, translation cannot be accounted Therefore, translation cannot be accounted for by a direct one-to-one correspondence for by a direct one-to-one correspondence between a nucleotide in RNA and an amino between a nucleotide in RNA and an amino acid in protein.acid in protein.

• The nucleotide sequence in mRNA is read The nucleotide sequence in mRNA is read in sets of 3 nucleotides, called in sets of 3 nucleotides, called codonscodons..

• Each codon corresponds to one amino acid.Each codon corresponds to one amino acid.• This mapping is determined by rules This mapping is determined by rules

known as the known as the genetic codegenetic code..

Page 19: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Genetic CodesGenetic Codes

Name 3L 1L codon Name 3L 1L codon Name 3L 1L codon Name 3L 1L codon

Alanine Ala A GCA Glutamic acidGlu E GAA Lysine Lys K AAA Threonine Thr T ACAGCC GAG AAG ACCGCG Glutamin Gln Q CAA Methionine Met M AUG ACGGCU CAG PhenylalaninePhe F UUC ACU

Arginine Arg R AGA Glycine Gly G GCA UUU Tryptophan Trp W UGGAGG GGC Proline Pro P CCA Tyrosin Tyr Y UACCGA GGG CCC UAUCGC GGU CCG Valine Val V GUACGG Histidine His H CAC CCU GUCCGU CAU Serine Ser S AGC GUG

Aspartic acid Asp D GAC Isoleucine Ile I AUA AGU STOP GUUGAU AUC UCA UAA

Arsparagine Asn N AAC AUU UCC UAGAAU Leucine Leu L UUA UCG UGA

Cystein Cys C UGC UUG UCUUGU CUA

CUCCUGCUU

* Only 20 different amino acids + STOP codes

• AUG acts as both initiation codon and codon for Methionine

Page 20: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Mechanisms of Translation: Mechanisms of Translation: InitiationInitiation

© Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition

Page 21: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Mechanisms of Translation: Mechanisms of Translation: ElongationElongation

© Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition

Page 22: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Mechanisms of Translation: Mechanisms of Translation: TerminationTermination

© Jones and Bartlett Publishers, Essential Genetics: A Genomics Perspective, 3rd Edition

Page 23: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

From Gene to ProteinFrom Gene to Protein

© Garland Science, Molecular Biology of The Cell, 4th Edition

Page 24: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Genes and GenomeGenes and Genome

• The fragment of DNA that corresponds to one The fragment of DNA that corresponds to one protein (by means of transcription and translation) is protein (by means of transcription and translation) is known as a known as a genegene..

• DNA molecules are usually very large, containing DNA molecules are usually very large, containing thousands of genes, and thus specify thousands of thousands of genes, and thus specify thousands of proteins.proteins.

• In all cells, the expression of individual genes is In all cells, the expression of individual genes is regulated: instead of manufacturing a full repertoire regulated: instead of manufacturing a full repertoire of all possible proteins at full tilt all the time, the of all possible proteins at full tilt all the time, the cell adjusts the rate of transcription and translation cell adjusts the rate of transcription and translation of different genes independently, according to need.of different genes independently, according to need.

• The entire genetic information encoded in an The entire genetic information encoded in an organism is called the organism is called the genomegenome..

Page 25: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Genotypes and PhenotypesGenotypes and Phenotypes

• The genome of an organism is different than the The genome of an organism is different than the genome of another organism, although many genome of another organism, although many similarities may exist.similarities may exist.

• The genetic constitution (i.e., the genome) of an The genetic constitution (i.e., the genome) of an organism is called the organism is called the genotype genotype of that organism.of that organism.

• The different cell types in a multi-cellular The different cell types in a multi-cellular organism differ dramatically in both structure organism differ dramatically in both structure and function.and function.

• This is because different cell types synthesize This is because different cell types synthesize and accumulate different sets of RNA and protein and accumulate different sets of RNA and protein molecules, without altering their genotype.molecules, without altering their genotype.

• The observable character of a cell or an The observable character of a cell or an organism is called the organism is called the phenotype phenotype of that cell.of that cell.

Page 26: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Systems’ ViewSystems’ View

• Biology is an Biology is an informationalinformational science science– SystematicallySystematically perturbing and perturbing and

monitoring biological systems utilizing monitoring biological systems utilizing powerful new high-throughput toolspowerful new high-throughput tools

– Creation of new Creation of new computational computational methods methods for modeling and analysis.for modeling and analysis.

– The The integrationintegration of discovery science of discovery science (data mining) and hypothesis-driven (data mining) and hypothesis-driven science (modeling & simulation)science (modeling & simulation)

Page 27: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

MolecuMolecular lar

CircuitrCircuitry of y of

CancerCancer

Hahn et al., Nature Review Cancer 2 (2002)

Page 28: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Wnt5a Signaling PathwayWnt5a Signaling Pathway

A.T.Weeraratna et al., Cancer Cell 1 (2002)

Page 29: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Genome DynamicsGenome DynamicsPerturbation

RNA

DNA Protein

MeasurementsReference DNA Sequence

Sequence VariantsGene Copy Number

CpG Methylation

RNA AbundanceRNA Half-life

Protein InteractionsProtein Modification

Protein Half-life

TranslationTranscription

Ectopic Expression

RNA interference

Protein/DNA Interactions

Increased Expression

Decreased Expression

Protein/RNA Interactions

Page 30: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Biological DataBiological Data

• Genomic dataGenomic data– SequencesSequences– SNPsSNPs– Gene Expression Gene Expression

MicroarraysMicroarrays– CGH arraysCGH arrays

• Proteomic dataProteomic data– MALDI (spectral MALDI (spectral

data)data)– Protein arraysProtein arrays

• Clinical dataClinical data– PatientsPatients– Drug treatmentDrug treatment

• Physiological dataPhysiological data– DietDiet– ExerciseExercise

Page 31: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Gene Expression MicroarraysGene Expression Microarrays

• It measures transcriptional activities of tens of It measures transcriptional activities of tens of thousands of genes simultaneously, resulting in thousands of genes simultaneously, resulting in individual snapshots of a cell’s transcriptional state at individual snapshots of a cell’s transcriptional state at any given time.any given time.

• While it reflects one of the central dynamic processes While it reflects one of the central dynamic processes of a biological system, it does not provide an accurate of a biological system, it does not provide an accurate picture of other important dynamic aspects, such as picture of other important dynamic aspects, such as the current levels of protein abundance, or of the the current levels of protein abundance, or of the activation state or modification state of extant activation state or modification state of extant proteins.proteins.

• To compensate for this, other measurement To compensate for this, other measurement technologies, i.e. protein abundance and interaction technologies, i.e. protein abundance and interaction arrays, can be combined with expression data to get a arrays, can be combined with expression data to get a comprehensive transcription, translation, and comprehensive transcription, translation, and modification profile.modification profile.

Page 32: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Single Nucleotide Polymorphisms Single Nucleotide Polymorphisms (SNPs)(SNPs)

• Genome ProjectsGenome Projects: Multiple genomic sequences provide : Multiple genomic sequences provide a reference estimate of normalitya reference estimate of normality

• Single nucleotide polymorphisms (SNPs), small genetic Single nucleotide polymorphisms (SNPs), small genetic changes or variations that can occur within a person's changes or variations that can occur within a person's DNA sequence, serve as possible markers of aberration DNA sequence, serve as possible markers of aberration from this reference that might indicate a disease cause from this reference that might indicate a disease cause or a susceptibility to diseaseor a susceptibility to disease

• Long runs of SNPs also serve to mark haplotypes, Long runs of SNPs also serve to mark haplotypes, groups of closely linked alleles that tend to be inherited groups of closely linked alleles that tend to be inherited together, which can be useful for following specific together, which can be useful for following specific chromosomal areas inherited by affected individuals in chromosomal areas inherited by affected individuals in familial genetic studiesfamilial genetic studies

• Several commercial platforms are currently available Several commercial platforms are currently available that survey genomes for SNPs at intervals approaching that survey genomes for SNPs at intervals approaching 20kb and smaller 20kb and smaller

Page 33: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Comparative Genomic Comparative Genomic Hybridization (CGH)Hybridization (CGH)

• Array based CGH (aCGH), first introduced by Array based CGH (aCGH), first introduced by Kallioniemi (Science, 1992), has proven to be a high Kallioniemi (Science, 1992), has proven to be a high throughput and sensitive genomic screening tool that throughput and sensitive genomic screening tool that detects DNA gains and losses with resolution of 1.0 to detects DNA gains and losses with resolution of 1.0 to 1.5 Mb using BAC arrays.1.5 Mb using BAC arrays.

• CGH data is read as the number of copies of a CGH data is read as the number of copies of a chromosomal region and array CGH provides a list of chromosomal region and array CGH provides a list of genes and genomic elements that are overrepresented genes and genomic elements that are overrepresented (gain) in the cell when an amplification event occurs (gain) in the cell when an amplification event occurs or underrepresented (loss) when deletions occur.or underrepresented (loss) when deletions occur.

• Currently, the application of chip based technology Currently, the application of chip based technology with highly annotated DNA targets of 20-mer or 60-with highly annotated DNA targets of 20-mer or 60-oligomer length permits whole genome surveys in oligomer length permits whole genome surveys in clinical specimens.clinical specimens.

Page 34: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Computational Systems Computational Systems BiologyBiology

Biological DataDNA, mRNA/cDNA,

CGH, SNP

Data Mining & Pattern Recognition· Automated & Systematic· Algorithmic & Computational

Candidate Biological Components

genesproteins

Derived Biological Context

biological processsubtype of disease

Biological Context as prior knowledge

biological processsubtype of disease

Clinical and Pathological Information

treatment history, age, gender, race, survival, and so on

Association studies

Association studies

Clustering Integration

· Better diagnostic markers· Better drug development· More efficient drug

treatment

Pathways discovery

Modeling

Mathematical and Computational Biological

Process ModelsDiscrete vs. Continuous

Deterministic vs. Stochastic

Biological Process

Biological operations

In-silicoBiological Process

Comp

In-silico Biological operations

Phenotypeobservation

PredictionHypothetical observation

Model refinement

Network Modeling & Systems Biology

Perturbation

Integration

Modeling

· Better treatment strategy· New drug targets

Measurements

Knowledge

Computable Knowledge

gene-to-gene relationships

gene ontologychemical databasegenomic database

proteomic database

Literature (PubMed)

Clinical chart/report

Chemistrycooperative binding

genomic databaseproteomic database

Text-mining

Databasing

Knowledge Representation & Mining

Knowledge Mining

Page 35: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Biological DataDNA, mRNA/cDNA,

CGH, SNP

Data Mining & Pattern Recognition· Automated & Systematic· Algorithmic & Computational

Candidate Biological Components

genesproteins

Derived Biological Context

biological processsubtype of disease

Biological Context as prior knowledge

biological processsubtype of disease

Clinical and Pathological Information

treatment history, age, gender, race, survival, and so on

Association studies

Association studies

Clustering Integration

· Better diagnostic markers· Better drug development· More efficient drug

treatment

Pathways discovery

Data mining & Pattern Data mining & Pattern RecognitionRecognition

• UnsupervisedUnsupervised analysis: analysis: exploratoryexploratory– Subtype recognitionSubtype recognition– Clustering analysisClustering analysis– Multi-Dimensional Scaling Multi-Dimensional Scaling

plot (MDS)plot (MDS)– Contextual pattern Contextual pattern

recognitionrecognition

• SupervisedSupervised analysis: analysis: discriminatorydiscriminatory– Classification of diseasesClassification of diseases– Rank genes according to their Rank genes according to their

impact on minimizing cluster impact on minimizing cluster volume and maximizing center-volume and maximizing center-to-center inter-cluster distanceto-center inter-cluster distance

– tt-test, SAM, TNoM, SVM, -test, SAM, TNoM, SVM, Gene@Work, Strong-featureGene@Work, Strong-feature

Page 36: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Clustering & MDS: Clustering & MDS: melanomamelanoma

Page 37: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

RNAi triggered by synthetic siRNA:

A powerful new tool forGene KnockdownsIn mammalian cells

D. Azorsa

RNA interferenceRNA interference

Page 38: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

low

high

RNAi Synthetic Lethal Phenotype RNAi Synthetic Lethal Phenotype Profiling of >10,000 siRNAProfiling of >10,000 siRNA

Context: BxPC3 Pancreatic Cancer Isogenic Cell Lines:DPC4 negative vs. DPC4 positive

Survival Scatter Plot

Highlighted Circles: Gene targeting events that preferentially affect the survival of the BxPC3 DPC4/SMAD4 minus cell line

Page 39: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Network Modeling and Network Modeling and Systems BiologySystems Biology

• Boolean networks Boolean networks – S. A. Kauffman, 1969S. A. Kauffman, 1969– On/Off representation of the On/Off representation of the

state of genesstate of genes– Boolean networks qualitatively Boolean networks qualitatively

capture typical genetic behaviorcapture typical genetic behavior

• Probabilistic Probabilistic Boolean Boolean networksnetworks– Shmulevich et al., Shmulevich et al.,

20022002– Stochastic Stochastic

extension of extension of Boolean networkBoolean network

• OthersOthers– Differential Differential

Equations, Linear Equations, Linear Model, Bayesian Model, Bayesian network …network …

Modeling

Mathematical and Computational Biological

Process ModelsDiscrete vs. Continuous

Deterministic vs. Stochastic

Biological Process

Biological operations

In-silicoBiological Process

Comp

In-silico Biological operations

Phenotypeobservation

PredictionHypothetical observation

Model refinement

Network Modeling & Systems Biology

Perturbation

Integration

Modeling

· Better treatment strategy· New drug targets

Biological DataDNA, mRNA/cDNA,

CGH, SNP

Data Mining & Pattern Recognition· Automated & Systematic· Algorithmic & Computational

Candidate Biological Components

genesproteins

Derived Biological Context

biological processsubtype of disease

Biological Context as prior knowledge

biological processsubtype of disease

Clinical and Pathological Information

treatment history, age, gender, race, survival, and so on

Association studies

Association studies

Clustering Integration

· Better diagnostic markers· Better drug development· More efficient drug

treatment

Pathways discovery

Measurements

Page 40: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

WNT5a

pirin

S100P

RET1Knowledge Repository: GO, GenMAPP, KEGG PubMed

Page 41: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Knowledge IntegrationKnowledge Integration

• Biological databaseBiological database– Genomic SequenceGenomic Sequence– ProteinProtein– Biochemical databaseBiochemical database

• KnowledgebaseKnowledgebase– PathwaysPathways– OntologyOntology– Protein-Protein Protein-Protein

InteractionInteraction– Gene-Gene InteractionGene-Gene Interaction

• Knowledge MiningKnowledge Mining– LiteraturesLiteratures– Clinical recordsClinical records

• BioLogBioLog– PubMed literature PubMed literature

access logger, archival access logger, archival and analyzerand analyzer

• Text- and Context-Text- and Context-miningmining

Knowledge

Computable Knowledge

gene-to-gene relationships

gene ontologychemical databasegenomic database

proteomic database

Literature (PubMed)

Clinical chart/report

Chemistrycooperative binding

genomic databaseproteomic database

Text-mining

Databasing

Knowledge Representation & Mining

Knowledge Mining

Biological DataDNA, mRNA/cDNA,

CGH, SNP

Data Mining & Pattern Recognition· Automated & Systematic· Algorithmic & Computational

Candidate Biological Components

genesproteins

Derived Biological Context

biological processsubtype of disease

Biological Context as prior knowledge

biological processsubtype of disease

Clinical and Pathological Information

treatment history, age, gender, race, survival, and so on

Association studies

Association studies

Clustering Integration

· Better diagnostic markers· Better drug development· More efficient drug

treatment

Pathways discovery

Modeling

Mathematical and Computational Biological

Process ModelsDiscrete vs. Continuous

Deterministic vs. Stochastic

Biological Process

Biological operations

In-silicoBiological Process

Comp

In-silico Biological operations

Phenotypeobservation

PredictionHypothetical observation

Model refinement

Network Modeling & Systems Biology

Perturbation

Integration

Modeling

· Better treatment strategy· New drug targets

Measurements

Page 42: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

PathwayAssistTM

Statistically ProcessedGene List

Acquire Current GeneIdentifiers and Information

Gene Ontology AnalysisNetwork AnalysisCanonical Pathway Analysis

Knowledge Mining: Extracting Biological Information from Global

RNAi Phenotype Data

Page 43: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005Figure 2. Doxorubicin and Drug Resistance Molecular Interaction Network. Doxorubicin Drug Resistance Pathway

Knowledge Mining: Building Knowledge Mining: Building Regulatory Networks from Regulatory Networks from Global RNAi PhenotypesGlobal RNAi Phenotypes

Page 44: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

AI@ASU, BY510, Oct. 25, 2005

Page 45: Integrated Computational Approach for Translational Biomedical Research Seungchan Kim, Ph.D. CSE, Arizona State University and MDTV/GenSIP, Translational

To be continued …To be continued …