sequence design for dna computing - seoul national university design for... · 2015-11-24 · role...

45
Sequence Design for DNA Sequence Design for DNA Computing Computing 2004. 10. 16 Advanced AI Soo-Yong Shin and Byoung-Tak Zhang Biointelligence Laboratory

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

Sequence Design for DNA Sequence Design for DNA ComputingComputing

2004. 10. 16Advanced AI

Soo-Yong Shin and Byoung-Tak ZhangBiointelligence Laboratory

Page 2: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNADNA

A single-stranded DNA molecule is a sequence over four possible nucleotides

Hydrogen bonds Hybridization

Watson-Crick Complement

Page 3: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Glossary TermsGlossary Terms(in these slides)(in these slides)

Duplex♦Double stranded DNA strands

Library♦A set of DNA strands for DNA computing

Page 4: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Role of DNA SequencesRole of DNA Sequences

Short DNA strands are the units of information storage and manipulation in a computation process.♦ Just like a computer memory

Usually, in a DNA computing a long strand is the solution of the given problem, which is a typically concatenation of short DNA strands.

Page 5: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Role of DNA SequencesRole of DNA Sequences

DNA bases DNA strands: represents each city. ⇒ information

Concatenation of DNA strands ⇒ Computing process

We have to design DNA strands very carefully.

Page 6: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence DesignSequence Design

Design the sequence set that correctly assembly them into the desired longer molecules♦ To form stable duplexes with only their

complements.♦ Two distinct strands are non-interacting

Between pairs of strandsBetween a strand and the Watson-Crick complement of another

relatively unstable, compared with any perfectly matched duplex formed from a DNA strand and its complement

Page 7: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence DesignSequence Design

5’-ATGCATGCAT-3’

3’-TACGTACGTA-5’

5’-AACCTTGGAC-3’

3’-TAGGATCAGA-5’

5’-ATGCATGCAT-3’3’-TACGTACGTA-5’

Desired output

Unexpected outputs

5’-ATGCATGCAT-3’3’-TAGGATCAGA-5’ ΔG

ΔG

>

5’-ATGCATGCAT-3’3’-TACGTACGTA-5’

Page 8: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example

SCTCA

PACGT

TGTTA

¬RTGAC

¬Q ¬P RGACT TGCA ACGT

¬S ¬T QGAGT CAAT CTGA

¬S ¬T QGAGT CAAT CTGA ¬P R

GACT TGCA ACGTSCTCA

PACGT

TGTTA

¬RTGAC

¬S ¬T QGAGT CAAT CTGA

GACT TGCA ACGTCTCA

PACGT

GTTA

¬RTGAC

15mer for each variable

? , , , ,

RPTSQTSRQP →∧→∧

Page 9: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

¬ Q ∨ ¬P ∨ R :5’ 3’

Q ∨ ¬T ∨ ¬S :3’ 5’

S : 5’ 3’

T : 3’ 5’

P : 5’ 3’

R : 5’ 3’

¬R : 3’ 5’

Sequence Design ExampleSequence Design Example

CGT ACG TAC GCT GAA CTG CCT TGC GTT GAC TGC GTT CAT TGT ATG

GTC AAC GCA AGG CAG

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

AAG CAG TAG CGA CCA

ATT GAC GCA AAT TGA

CAT ACA ATG AAC GCA

TGC GTT CAT TGT ATG

Page 10: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example-- Reaction in a Test TubeReaction in a Test Tube

R ∨ ¬P ∨ ¬Q S

Q ∨ ¬T ∨ ¬S T P

¬R

Page 11: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design ExampleSequence Design Example-- Hybridization and Hybridization and LigationLigation

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬S

AGT TAA ACG CAG TTA

T GTC AAC GCA AGG CAG

P

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R

ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬SAGT TAA ACG CAG TTA

TGTC AAC GCA AGG CAG

P

GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC

R ∨ ¬P ∨ ¬Q

CAT ACA ATG AAC GCA

¬R

ACC AGC GAT GAC GAA

S

TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT

Q ∨ ¬T ∨ ¬S

AGT TAA ACG CAG TTA

T

GTC AAC GCA AGG CAG

P

Page 12: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

Page 13: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Theoretical ModelsTheoretical Models

Related to the coding problem can be derived from classical theory of codes.♦ Ex) error correcting codes

Watson-Crick complementarity is a new feature to be considered.

H-system or splicing system

Page 14: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

Page 15: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Strand Design CriteriaStrand Design Criteria

Preventing undesired reactionsControlling the secondary structuresControlling the chemical characteristics of libraryRestricting DNA sequences

Page 16: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

Forces the library to form the duplexes between a given DNA strand and its complement only.♦Hamming distance♦Reverse complement Hamming distance♦ Similarity♦H-measure♦ 3’-end H-measure

Page 17: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

Similarity♦ Simple Hamming distance with (or without)

position shifts♦Compared the sequences with other sequences for

the same direction

5’-ATGCATGC-3’5’-ACCAATCG-3’

Similarity = 3

5’-ATGCATGC-3’5’-ACCAATCG-3’

Similarity = 2

Page 18: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Preventing Undesired ReactionsPreventing Undesired Reactions

H-measure♦ Simple Hamming distance with (or without)

position shifts♦Compared the sequences with other sequences for

the opposite direction♦ To make duplex only at the planned positions

5’-ATGCATGC-3’3’-GCTAACCA-5’

H-measure = 1

Page 19: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Secondary structures are usually formed by the interaction of single stranded DNA.♦ Internal loop, hairpin loop, bulge loop, and so on.

Prediction methods♦ Thermodynamic approach♦Continuity

It can be encouraged or prohibited by the target problem.

Page 20: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA StructuresDNA StructuresSantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Fig. 1

Page 21: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA Structures for DNA DNA Structures for DNA ComputingComputing

Self-assembly computation Winfree et al., Nature, 394: 539-544

Page 22: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

DNA Structures for DNA DNA Structures for DNA ComputingComputing

Whiplash PCR

Page 23: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Thermodynamic approach♦ Based on nearest neighbor parameters and

dynamic programming♦Mfold♦Vienna RNA package

Page 24: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Secondary Controlling the Secondary StructuresStructures

Continuity♦Reduce continuous occurrence of the same base

more than threshold.♦ If the same base appears continuously, a reaction

is not well controllable since the structure of DNA will become unstable.

5’-ATGGGGGCATGC-3’

Continuity = 5

Page 25: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

It is desirable to have similar chemical characteristics for the successful DNA operations.♦ Free energy♦Melting temperature♦GC ratio

Page 26: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Free energy♦ The energy to make a duplex ♦Actually, it is defined as the energy required to

break a duplex♦ The most reliable measure for the relative stability

of a DNA duplex♦ Easily calculated by the nearest neighbor model

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

Page 27: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Nearest neighbor parameters

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

SantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Table 1

Page 28: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Melting temperature♦ The temperature at which 50% of DNA strands

and its perfect complement are in duplex.

)4/|ln(| Tm CRS

HT+°∆

°∆=

R : Boltzmann’s constant (1.987 cal/(K mol))[C]] : total molar strand concentrationT : Kelvin

°∆+°∆+°∆=°∆

°∆+°∆+°∆=°∆

}{

}{

stackskkinitends

stackskkinitends

SSSS

HHHH ΔH : EnthalpyΔS : Entropy

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

Page 29: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

GC ratio♦ The percentage of G or C in a whole DNA

sequence♦ The most simple method, but unreliable

Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library

5’-ATGGTTGCATGC-3’

GCratio = 50%

Page 30: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Restricting DNA SequencesRestricting DNA Sequences

Restriction of the composition (DNA base or subsequence) of a DNA sequence.♦One of four DNA bases is reserved for the special

purposes.♦ Special DNA sequences such as restriction enzyme

site

Page 31: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

Page 32: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Exhaustive searchRandom searchTemplate-map strategyGraph methodStochastic methodsDynamic programmingEvolutionary algorithmsBiological-inspired methods

Page 33: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Exhaustive search♦Hartemink et al. DNA4, pp. 227-235, 1998.

Random search♦ Penchovsky and Ackermann, Journal of Comput.

Biology, 10(2): 215-229, 2003.

Page 34: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Template-map strategy♦ Wenbin Liu et al. J. Chem. Inf. Comput. Sci. 2003, 43, 2014-2018

Template 10010011

Map 10100101

Sequence TCGACGAT

Page 35: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Graph method♦ Feldkamp et al. GPEM 4: 153-171, 2003.

Page 36: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Stochastic methods♦ Simulated annealing♦ Tanaka et al., DNA7, pp. 179-188, 2001.

Dynamic Programming♦Marathe et al., DNA5, pp. 75-89, 1999.

Page 37: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Evolutionary algorithm♦Deaton et al., Physical Review Letters, 80(2): 417-

420, 1998.♦ Shin et al., IEEE Trans. Evolutionary Computation.

Page 38: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Biological-inspired methods♦Deaton et al. DNA8, pp. 196-204, 2002.

Page 39: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence Design AlgorithmSequence Design Algorithm

Biological-inspired methods

Page 40: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Research DirectionsResearch Directions

Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools

Page 41: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Requirements of DNA Sequence Requirements of DNA Sequence Design SystemsDesign Systems

Sequence reliabilityUser friendlinessAnalysis capabilitySequence reusability

Page 42: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/SeqNACST/Seq

Sequence Generator♦ Based on MOEA♦ Using 6 objectives

- GC Ratio, Tm, Continuity, Hairpin, H-measure, Similarity

Each run of MOEA

Selected Pareto optimal

Page 43: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequence GenerationSequence Generation

① Generation Options ② Sequence Structure ③ Sequence Options

④ Fitness Options⑥ Genetic Algorithm Options ⑤ Fitness Parameter Setting

Page 44: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/ReportNACST/Report

① Show Fitness Result- Each objective values of generated pools.- User can get fitness information of each pool.

② Compare two pools- Visualizes the superiority of each fitness comparing sequences of two pools.- User can select pools arbitrarily.

③ Analyze a pool- Shows each nucleotide.- Finds the given subsequence.- Finds the given complementary sequence. - Finds continuous occurrence of each nucleotide.- User can choose a pool arbitrarily.

Page 45: Sequence Design for DNA Computing - Seoul National University Design for... · 2015-11-24 · Role of DNA Sequences zShort DNA strands are the units of information storage and manipulation

© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/

NACST/PlotNACST/Plot① Project Plot- Plot fitness results.- Plots comparison result of two pools.- User can browse plotting history.- Plotted graphs can be saved as postscript file.

② Data Plot- Plots arbitrary data from a given file.

Comparison graphFitness results graph

Data plot