sequence design for dna computing - seoul national university design for... · 2015-11-24 · role...

Post on 09-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sequence Design for DNA Sequence Design for DNA ComputingComputing

2004. 10. 16Advanced AI

Soo-Yong Shin and Byoung-Tak ZhangBiointelligence Laboratory

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNADNA

A single-stranded DNA molecule is a sequence over four possible nucleotides

Hydrogen bonds Hybridization

Watson-Crick Complement

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Glossary TermsGlossary Terms(in these slides)(in these slides)

Duplex♦Double stranded DNA strands

Library♦A set of DNA strands for DNA computing

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Role of DNA SequencesRole of DNA Sequences

Short DNA strands are the units of information storage and manipulation in a computation process.♦ Just like a computer memory

Usually, in a DNA computing a long strand is the solution of the given problem, which is a typically concatenation of short DNA strands.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Role of DNA SequencesRole of DNA Sequences

DNA bases DNA strands: represents each city. ⇒ information

Concatenation of DNA strands ⇒ Computing process

We have to design DNA strands very carefully.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence DesignSequence Design

Design the sequence set that correctly assembly them into the desired longer molecules♦ To form stable duplexes with only their

complements.♦ Two distinct strands are non-interacting

Between pairs of strandsBetween a strand and the Watson-Crick complement of another

relatively unstable, compared with any perfectly matched duplex formed from a DNA strand and its complement

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence DesignSequence Design

5’-ATGCATGCAT-3’

3’-TACGTACGTA-5’

5’-AACCTTGGAC-3’

3’-TAGGATCAGA-5’

5’-ATGCATGCAT-3’3’-TACGTACGTA-5’

Desired output

Unexpected outputs

5’-ATGCATGCAT-3’3’-TAGGATCAGA-5’ ΔG

ΔG

>

5’-ATGCATGCAT-3’3’-TACGTACGTA-5’

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example

SCTCA

PACGT

TGTTA

¬RTGAC

¬Q ¬P RGACT TGCA ACGT

¬S ¬T QGAGT CAAT CTGA

¬S ¬T QGAGT CAAT CTGA ¬P R

GACT TGCA ACGTSCTCA

PACGT

TGTTA

¬RTGAC

¬S ¬T QGAGT CAAT CTGA

GACT TGCA ACGTCTCA

PACGT

GTTA

¬RTGAC

15mer for each variable

? , , , ,

RPTSQTSRQP →∧→∧

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

¬ Q ∨ ¬P ∨ R :5’ 3’

Q ∨ ¬T ∨ ¬S :3’ 5’

S : 5’ 3’

T : 3’ 5’

P : 5’ 3’

R : 5’ 3’

¬R : 3’ 5’

Sequence Design ExampleSequence Design Example

CGT ACG TAC GCT GAA CTG CCT TGC GTT GAC TGC GTT CAT TGT ATG

GTC AAC GCA AGG CAG

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

AAG CAG TAG CGA CCA

ATT GAC GCA AAT TGA

CAT ACA ATG AAC GCA

TGC GTT CAT TGT ATG

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example-- Reaction in a Test TubeReaction in a Test Tube

R ∨ ¬P ∨ ¬Q S

Q ∨ ¬T ∨ ¬S T P

¬R

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example-- Hybridization and Hybridization and LigationLigation

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬S

AGT TAA ACG CAG TTA

T GTC AAC GCA AGG CAG

P

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R

ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬SAGT TAA ACG CAG TTA

TGTC AAC GCA AGG CAG

P

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R

ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬S

AGT TAA ACG CAG TTA

T

GTC AAC GCA AGG CAG

P

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Theoretical ModelsTheoretical Models

Related to the coding problem can be derived from classical theory of codes.♦ Ex) error correcting codes

Watson-Crick complementarity is a new feature to be considered.

H-system or splicing system

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Strand Design CriteriaStrand Design Criteria

Preventing undesired reactionsControlling the secondary structuresControlling the chemical characteristics of libraryRestricting DNA sequences

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

Forces the library to form the duplexes between a given DNA strand and its complement only.♦Hamming distance♦Reverse complement Hamming distance♦ Similarity♦H-measure♦ 3’-end H-measure

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

Similarity♦ Simple Hamming distance with (or without)

position shifts♦Compared the sequences with other sequences for

the same direction

5’-ATGCATGC-3’5’-ACCAATCG-3’

Similarity = 3

5’-ATGCATGC-3’5’-ACCAATCG-3’

Similarity = 2

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

H-measure♦ Simple Hamming distance with (or without)

position shifts♦Compared the sequences with other sequences for

the opposite direction♦ To make duplex only at the planned positions

5’-ATGCATGC-3’3’-GCTAACCA-5’

H-measure = 1

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Secondary structures are usually formed by the interaction of single stranded DNA.♦ Internal loop, hairpin loop, bulge loop, and so on.

Prediction methods♦ Thermodynamic approach♦Continuity

It can be encouraged or prohibited by the target problem.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA StructuresDNA StructuresSantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Fig. 1

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA Structures for DNA DNA Structures for DNA ComputingComputing

Self-assembly computation Winfree et al., Nature, 394: 539-544

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA Structures for DNA DNA Structures for DNA ComputingComputing

Whiplash PCR

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Thermodynamic approach♦ Based on nearest neighbor parameters and

dynamic programming♦Mfold♦Vienna RNA package

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Continuity♦Reduce continuous occurrence of the same base

more than threshold.♦ If the same base appears continuously, a reaction

is not well controllable since the structure of DNA will become unstable.

5’-ATGGGGGCATGC-3’

Continuity = 5

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

It is desirable to have similar chemical characteristics for the successful DNA operations.♦ Free energy♦Melting temperature♦GC ratio

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Free energy♦ The energy to make a duplex ♦Actually, it is defined as the energy required to

break a duplex♦ The most reliable measure for the relative stability

of a DNA duplex♦ Easily calculated by the nearest neighbor model

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Nearest neighbor parameters

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

SantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Table 1

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Melting temperature♦ The temperature at which 50% of DNA strands

and its perfect complement are in duplex.

)4/|ln(| Tm CRS

HT+°∆

°∆=

R : Boltzmann’s constant (1.987 cal/(K mol))[C]] : total molar strand concentrationT : Kelvin

°∆+°∆+°∆=°∆

°∆+°∆+°∆=°∆

}{

}{

stackskkinitends

stackskkinitends

SSSS

HHHH ΔH : EnthalpyΔS : Entropy

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

GC ratio♦ The percentage of G or C in a whole DNA

sequence♦ The most simple method, but unreliable

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

5’-ATGGTTGCATGC-3’

GCratio = 50%

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Restricting DNA SequencesRestricting DNA Sequences

Restriction of the composition (DNA base or subsequence) of a DNA sequence.♦One of four DNA bases is reserved for the special

purposes.♦ Special DNA sequences such as restriction enzyme

site

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Exhaustive searchRandom searchTemplate-map strategyGraph methodStochastic methodsDynamic programmingEvolutionary algorithmsBiological-inspired methods

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Exhaustive search♦Hartemink et al. DNA4, pp. 227-235, 1998.

Random search♦ Penchovsky and Ackermann, Journal of Comput.

Biology, 10(2): 215-229, 2003.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Template-map strategy♦ Wenbin Liu et al. J. Chem. Inf. Comput. Sci. 2003, 43, 2014-2018

Template 10010011

Map 10100101

Sequence TCGACGAT

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Graph method♦ Feldkamp et al. GPEM 4: 153-171, 2003.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Stochastic methods♦ Simulated annealing♦ Tanaka et al., DNA7, pp. 179-188, 2001.

Dynamic Programming♦Marathe et al., DNA5, pp. 75-89, 1999.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Evolutionary algorithm♦Deaton et al., Physical Review Letters, 80(2): 417-

420, 1998.♦ Shin et al., IEEE Trans. Evolutionary Computation.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Biological-inspired methods♦Deaton et al. DNA8, pp. 196-204, 2002.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Biological-inspired methods

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Requirements of DNA Sequence Requirements of DNA Sequence Design SystemsDesign Systems

Sequence reliabilityUser friendlinessAnalysis capabilitySequence reusability

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/SeqNACST/Seq

Sequence Generator♦ Based on MOEA♦ Using 6 objectives

- GC Ratio, Tm, Continuity, Hairpin, H-measure, Similarity

Each run of MOEA

Selected Pareto optimal

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence GenerationSequence Generation

① Generation Options ② Sequence Structure ③ Sequence Options

④ Fitness Options⑥ Genetic Algorithm Options ⑤ Fitness Parameter Setting

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/ReportNACST/Report

① Show Fitness Result- Each objective values of generated pools.- User can get fitness information of each pool.

② Compare two pools- Visualizes the superiority of each fitness comparing sequences of two pools.- User can select pools arbitrarily.

③ Analyze a pool- Shows each nucleotide.- Finds the given subsequence.- Finds the given complementary sequence. - Finds continuous occurrence of each nucleotide.- User can choose a pool arbitrarily.

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/PlotNACST/Plot① Project Plot- Plot fitness results.- Plots comparison result of two pools.- User can browse plotting history.- Plotted graphs can be saved as postscript file.

② Data Plot- Plots arbitrary data from a given file.

Comparison graphFitness results graph

Data plot

top related