an empirical study of choosing efficient discriminative seeds for oligonucleotide design
DESCRIPTION
An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design. Won-Hyong Chung and Seong-Bae Park Dept. of Computer Engineering Kyungpook National University, South Korea. Motivation. Issues for designing oligonucleotides To minimize the cross-hybridizations - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/1.jpg)
An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design
Won-Hyong Chung and Seong-Bae Park
Dept. of Computer EngineeringKyungpook National University, South Korea
![Page 2: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/2.jpg)
Motivation• Issues for designing oligonucleotides
– To minimize the cross-hybridizations– To minimize the computing time
• Seeding (or indexing) have been widely used for concurring those issues by means of pre-screening unreliable sequence regions before calculating cross-hybridizations.
• Although many types of seeding methods have been proposed, measure of evaluating the seeds regarding how adequate and efficient they are in the oligonucleotide design is not yet proposed.
![Page 3: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/3.jpg)
Difference between alignment and oligonucleotide design
• Alignment– To find all possible alignments which have enough
scores.– Sensitivity is important, while specificity is usually
guaranteed by seed’s own specificity.• Oligoncleotide design
– To find optimal oligonucleotides to differentiate target sequences from the others.
– Specificity should be considered as well as sensitivity for checking cross-hybridization.
![Page 4: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/4.jpg)
Objectives
• We propose novel measures of evaluating the seeds based on the discriminability and the efficiency.
• We examine five seeding methods in oligonucleotide design.– continuous, spaced, transition-constrained, BLAT, and
Vector seed• We provide a software package SeedChooser
which enables users to get the adequate seeds under their own experimental conditions.
![Page 5: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/5.jpg)
What is Seed?
• Seeding process– Filtering step: short fixed-length common words
which are found at both query and target sequences are selected.
– Extension step: the selected words are extended to the size of oligonucleotide and be checked the cross-hybridization.
Seed = the filtering template of the fixed-length words
![Page 6: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/6.jpg)
Seeding methods (1/2)
• Continuous seed: a seed to find k-length exact matches– BLAST employs 11-bp length seed 11111111111
• Spaced seed: allowing don’t care letter labeled ‘0’ in the seed– 18-bp-length seed containing 11-bp matches 101101100111001011 is
used at PatternHunter.
• Transition-constrained seed: adopting transition (A <-> G, C <-> T) letter ‘@’ in the seed– YASS used such seed 1110@10010@1010111, it consists of 18-bp
length, 10-bp matches and 2 transitions.
![Page 7: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/7.jpg)
Seeding methods (2/2)
• Blat seed: a continuous seed allowing one or two mismatches at any positions of the seed.
• Vector seed: a generalized seed by combining the idea of BLAT seed and spaced seed.
• BLAT seed and Vector seed allow some mismatches in any positions.– They greatly increase the sensitivity but spends much
more computing time than the previous seeds.
![Page 8: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/8.jpg)
The Issues of seeds for oligo design
• An ideal seed should filter all regions as fast as possible that have no possibility of being chosen as an oligo.
a seed should find as many oligos as possible
a seed should avoid to find non-oligo region
a seed should minimize the cost of indexing to
find oligos
![Page 9: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/9.jpg)
Discriminability
The discriminability is a balance between precision and recall to minimize both false positives and false negatives.
indices seed of #
oligoshit indices seed of #P
oliogs of #
hit(s) seed containing oligos of #R
jumpalpha
![Page 10: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/10.jpg)
EfficiencyThe efficiency is the proportion of useful regions
filtered by a seed.– the duplication ratio of generated indices– the average number of indices in each oligo
indices seed unique of #
indices seed generated theof #D
oligos of #
oligosin indices seed of #A
jumpbeta, gamma
![Page 11: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/11.jpg)
Efficient discriminability
The efficient discriminative seed is the seed that has the maximum efficient discriminability value for the given
![Page 12: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/12.jpg)
Experiments• Empirically chosen seeds were evaluated by three
measures, discriminability, efficiency, and efficient discriminability, respectively.
• We tested the seeds for designing the 50mer oligos.– The parameters are set to 1 for evaluation.
• Simulated data set– A set of random sequences which are generated by
OligoGenerator in SeedChooser.• Biological data set
– Ecologically important genes involved in the nitrogen and carbon cycles.
– nirS: nitrite reductase gene set– pmoA: methane monooxygenase gene set
![Page 13: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/13.jpg)
Discriminability of the five seeding methods
Seed weight
5 10 15 20 25 30
Dis
crim
inab
ility
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Continuous Spaced Transition BLAT Vector
![Page 14: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/14.jpg)
Efficiency of the five seeding methods
Seed weight
5 10 15 20 25 30
Eff
icie
ncy
0.06
0.08
0.10
0.12
0.14
0.16
0.18
Continuous Spaced Transition BLAT Vector
![Page 15: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/15.jpg)
Efficient Discriminability the five seeding methods
Seed weight
5 10 15 20 25 30
Eff
icie
nt
Dis
crim
inab
ility
0.02
0.04
0.06
0.08
0.10
0.12
Continuous Spaced Transition BLAT Vector
![Page 16: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/16.jpg)
Evaluation results of pmoA data set
![Page 17: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/17.jpg)
Evaluation results of nirS data set
![Page 18: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/18.jpg)
SeedChooser: Seed Evaluation and Recommendation Tools
• SeedChooser : To recommend best seeds by the evaluation parameters. It uses genetic algorithm to find best seeds.
• SeedEvaluator : To evaluate a set of the seeds by the parameters.
• OligoGenerator : To generate a set of oligos for the desired experimental conditions.
• SeedChooser homepagehttp://ml.knu.ac.kr/~whchung/seedchooser.html
![Page 19: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/19.jpg)
CONCLUSION
• The novel measure for evaluating the seeds in the oligo design based on the discriminability and the efficiency.
• The spaced seed was generally preferred to the other seeding methods.
• Our study can be applied to the oligo design programs in order to improve the performance by suggesting the experiment-specific seeds.
• We expect that our study will be helpful to the other genomic tasks.
![Page 20: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/20.jpg)
Supplementary materials
![Page 21: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/21.jpg)
• T1, T2, T3: the target sequences.• P1 and P2 are the matched oligos for an oligo P0• S1, S2 and S3 are the seed indices for S0 by a seed.
T1
T2
T3
P1
P2
P0
S1
S2
S3
S0T0
back
![Page 22: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/22.jpg)
Relations of precision, recall and discriminability
Seed weight
6 8 10 12 14 16 18 20 22 24 26
Dis
crim
inab
ility
0.2
0.4
0.6
0.8
1.0
1.2
Precision Recall Discriminability
![Page 23: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/23.jpg)
Discriminability according to values of α
Seed weight
6 8 10 12 14 16 18 20 22 24 26
Dis
crim
inab
ility
0.2
0.4
0.6
0.8
1.0
1.2
8421
2/14/18/1
back
![Page 24: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/24.jpg)
Efficiency according to values of β and γ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Eff
icie
ncy
Beta
Gamma
0.0 0.2 0.4 0.6 0.8 1.0
back
![Page 25: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design](https://reader037.vdocuments.net/reader037/viewer/2022102809/56814645550346895db35117/html5/thumbnails/25.jpg)
Efficient Discriminability for 70mer Oligos
Seed weight
5 10 15 20 25 30
Eff
icie
nt
Dis
crim
inab
ility
0.00
0.02
0.04
0.06
0.08
0.10
Continuous Spaced Transition BLAT Vector