predicting expression levels using codon usage bias

27
Doug Raiford Lesson 19

Upload: ekram

Post on 15-Jan-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Doug Raiford Lesson 19. Predicting Expression Levels Using Codon Usage Bias. Nice to be able to predict. Actually have very expensive experiments that do this Sequence only would be nice. An example. Worked on a project that predicted metabolic efficiency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Predicting Expression Levels Using Codon Usage Bias

Doug RaifordLesson 19

Page 2: Predicting Expression Levels Using Codon Usage Bias

Actually have very expensive experiments that do this

Sequence only would be nice

04/21/23 2Expression Prediction with CUB

Page 3: Predicting Expression Levels Using Codon Usage Bias

Worked on a project that predicted metabolic efficiency Tendency for

organisms to utilize, where possible, less expensive amino acids

Tested by looking at expression vs. protein biosynthetic cost

04/21/23 3Expression Prediction with CUB

Protein production rate (expressivity)B

iosy

nthe

tic C

ost

Page 4: Predicting Expression Levels Using Codon Usage Bias

In highly expressed, extremely biased usage of certain codons

04/21/23 4Expression Prediction with CUB

CTACTCCTGCTTTTATTG

Leucine

One of the most highly expressed genes in Escherichia coli K12 has 9 CTG codons and zero of all other codons that code for leucine

Page 5: Predicting Expression Levels Using Codon Usage Bias

Translational efficiency

04/21/23 5Expression Prediction with CUB

gc?(ala)

Ribosome

alaala

gcualaala

gcu

alaala

gcualaala

gcu

ala

gcu

alaala

gcu

alaala

gcu

alaala

gcu

alaala

gcualaala

gcu

alaala

gcu

alaala

gcc alaala

gcc

alaala

gcc

alaala

gcc

alaala

gca

mRNA

tRNAs

Protein Strand

alaala

gcggcg

Page 6: Predicting Expression Levels Using Codon Usage Bias

How would you use this biased usage to predict expression?

Frequency of preferred codons (FOP)

Just look at most highly expressed Either experimentally

determined or genes known to be highly expressed

Calculate usage for all genes Usage predictive of

expressivity04/21/23 6Expression Prediction with CUB

Expression of Genes

12345678...N

Page 7: Predicting Expression Levels Using Codon Usage Bias

That is, given sequence data only, can we determine probable expression levels?

04/21/23 7Expression Prediction with CUB

1 atgggttggt caatcatctg atttaatggg caaattttta aagatgcaca ttatatcagc 61 aaaaaatcga acctgttggg tcttgcgcag ggtgccggac ttggcctagt tttgggcctc 121 aagatgacga tcaaatgacg aaagcttgcc tggtcgaggg ttttttcaac cgtcgattgc 181 gggagcgggg ttgtgcggcc gtatggcgga aatcgctatt cggttgagct gggacgatgg 241 caggacgggg agcggtgcgc ttggacacgc aaacttggca ggaacagggg ctcgaaaccc 301 ggtctccggg acgcacgcgc ggtgaaatca gccaggatga actggcgcac cagtggagcc 361 gtgttcgcgg ccgacttcag gaagaaatcg gcgaggtcga gtaccgcaac tggttgcggc 421 aagccgtgct gcatgggctc gacggcgatg aagtgactgt catgctgccg acccgcttcc 481 tgcgtgactg ggtgaacaag gaatatggca acctgctgac cgcgttctgg caggccgaga 541 acccggcggt acggcgcgtg gatatccgga cccggccggc cggcaccagc gagcgcgcgc 601 ccgacctcgc cgaggtggag ccgaagaccg cgatcgcgcg gcccgccgcc gcggcgcgcc 661 gcgaggccga ggaacgcccg gacatgagcg cgccgctcga cccgcgcttc acctttgata 721 cattcgtggt cggcaagccg aacgaattcg cctatgcctg cgcgcgccgc gtcgccgacg

Page 8: Predicting Expression Levels Using Codon Usage Bias

Look at data in matrix If we assume that the major

force driving variance in codon usage is translational efficiency

If highly expressed genes have high usage of preferred, low usage of non-preferred, weakly expressed have more balanced usage (or even avoidance of preferred)

What does this sound like?

04/21/23 8Expression Prediction with CUB

Page 9: Predicting Expression Levels Using Codon Usage Bias

Can find axis of greatest variance

Genes projected on this axis

Highly expressed at one end and weakly at other

04/21/23 9Expression Prediction with CUB

Page 10: Predicting Expression Levels Using Codon Usage Bias

…finding which codons are preferred If codon’s usage is correlated with location

on PC… That is if genes at one end exhibit low usage and

genes at other exhibit high

04/21/23 10Expression Prediction with CUB

Probably a preferred codon

Projection of genes on first principle component

Correlated?

Page 11: Predicting Expression Levels Using Codon Usage Bias

Region in middle

Really more accurate look at distance from cluster

04/21/23 11Expression Prediction with CUB

Page 12: Predicting Expression Levels Using Codon Usage Bias

SCCI (Carbone, et al.)Looks for most self consistent set of

genes

04/21/23 12Expression Prediction with CUB

Search for these genesSearch for these genes

Page 13: Predicting Expression Levels Using Codon Usage Bias

Looking for subset of genes (reference set) that define a bias to which they themselves adhere more strongly than the rest of the genes

04/21/23 13Expression Prediction with CUB

Start with all genes as reference setLoop till reference set size 1%

Determine which codons are preferredDetermine average usage for all genesSort by adherenceTake genes in top half to be the new reference setRepeat

Start with all genes as reference setLoop till reference set size 1%

Determine which codons are preferredDetermine average usage for all genesSort by adherenceTake genes in top half to be the new reference setRepeat

AlgorithmAlgorithm

Page 14: Predicting Expression Levels Using Codon Usage Bias

Do you think all organisms have translational efficiency bias?

How would you expect metabolic efficiency trends to look in organisms that do not have?

04/21/23 14Expression Prediction with CUB

Protein production rate (expressivity)

Bio

synt

hetic

Cos

t

?

Some actually exhibited significant and positive

trends

Some actually exhibited significant and positive

trends

Page 15: Predicting Expression Levels Using Codon Usage Bias

What could cause a positive trend?

Organisms preferentially utilize the most expensive aa’s in the most highly expressed genes?

We decided the problem must be in our prediction of expressivity

Somehow we got it wrong—in fact, it seems we got it exactly opposite

04/21/23 15Expression Prediction with CUB

Protein production rate (expressivity)

Bio

synt

hetic

Cos

t

?

Page 16: Predicting Expression Levels Using Codon Usage Bias

Misbehavers were all high and low GC-content organisms

But how would this cause a positive trend

Breakthrough came with Nostoc

Greedy algorithm was finding high AT-content that were on opposite side of PCA 2D codon usage space

04/21/23 16Expression Prediction with CUB

-8 -6 -4 -2 0 2 4 6-4

-2

0

2

4

6

8

First Principal Component

Se

con

d P

rinci

pa

l Co

mp

on

en

t

PCA on Nostocalgorithm identified reference set

vs. highly expressed

Page 17: Predicting Expression Levels Using Codon Usage Bias

Algorithm is a search for self-consistent genes

What does search space look like—why did the algorithm get fooled

Our lab was heavy into GA’s Think of all

optimization problems in terms of being a search

Fitness landscape04/21/23 17Expression Prediction with CUB

Carbone’s algorithm found the reference set associated with the dominant bias—what about the next most dominant

Carbone’s algorithm found the reference set associated with the dominant bias—what about the next most dominant

Page 18: Predicting Expression Levels Using Codon Usage Bias

How arrange solutions along two axes (with fitness in a third)

How reduce the number of solutions

04/21/23 Expression Prediction with CUB 18

Number of possible solutions

Number of possible solutions

Page 19: Predicting Expression Levels Using Codon Usage Bias

Reference sets tend to be proximal

If choose nearest neighbors will only have to calculate fitness for each gene

We already have a method for viewing gene placement in a 2D space: PCA

Elevated regions: highly self-consistent04/21/23 Expression Prediction with CUB 19

-8 -6 -4 -2 0 2 4 6-4

-2

0

2

4

6

8

First Principal Component

Se

con

d P

rinci

pa

l Co

mp

on

en

t

PCA on Nostocalgorithm identified reference set

vs. highly expressed

AT-content ridge

dominates search space

AT-content ridge

dominates search space

Page 20: Predicting Expression Levels Using Codon Usage Bias

How fix algorithm? I modified the SCCI algorithm to avoid

unbalanced GC-content regions Push down

04/21/23 20Expression Prediction with CUB

Page 21: Predicting Expression Levels Using Codon Usage Bias

Greedy algorithm gets perfect self-consistency scores

Modified algorithm does not

Decided to try using a GA to improve

04/21/23 21Expression Prediction with CUB

We can rebuild him. We have the technology. We have the capability to build the world's first bionic man. Steve Austin will be that man. Better than he was before. Better, stronger, faster.

We can rebuild him. We have the technology. We have the capability to build the world's first bionic man. Steve Austin will be that man. Better than he was before. Better, stronger, faster.

Parent One

Parent Two

g1 g2 g3 g4 g5 … gN

g1 g2 g3 g4 g5 … gN

Child g1 g2 g3 g4 g5 … gN

Mutate

Page 22: Predicting Expression Levels Using Codon Usage Bias

Searched for a set of genes that were both Self-consistent And that identified a bias to which known

highly expressed genes strongly adhered

04/21/23 Expression Prediction with CUB 22

Self-consistent

Ranki

ng o

f H

EG

s

Two ObjectivesTwo Objectives

Page 23: Predicting Expression Levels Using Codon Usage Bias

Count the number of solutions that dominate (better in both dimensions)

Solutions on the Pareto front: no other solution is better in both dimensions

The fewer there are the higher the fitness

Genes on front given highest fitness

04/21/23 Expression Prediction with CUB 23

Self-consistent

Ran

kin

g o

f H

EG

s

Page 24: Predicting Expression Levels Using Codon Usage Bias

Those that identified a bias to which known highly expressed genes strongly adhered was by far the best

But the reference set we identified were not among the most highly expressed… yet the bias it discovered (the codon preferences it identified) yielded much better predictions of actual expressivity

04/21/23 Expression Prediction with CUB 24

Self-consistent

Ran

kin

g o

f H

EG

s

Best SolutionsBest Solutions

Page 25: Predicting Expression Levels Using Codon Usage Bias

We just found a better set of codon preferences

Why not directly search for codon preferences?

Reframe the problem Instead of “given a set of

known highly expressed genes, determine which codons they seem to prefer and use these preferences to rank the whole genome”

We asked “given a set of known highly expressed genes, which set of codon preferences (weights associated with each codon) yield a gene ranking with known highly expressed genes at the top”04/21/23 Expression Prediction with CUB 25

Page 26: Predicting Expression Levels Using Codon Usage Bias

Given a set of known highly expressed genes, which set of codon preferences (weights associated with each codon) yield a gene ranking with known highly expressed genes at the top

04/21/23 Expression Prediction with CUB 26

Parent One

Parent Two

w1 w2 w3 w4 w5 …w59

w1 w2 w3 w4 w5 …w59

Child w1 w2 w3 w4 w5 …w59

Mutate

Page 27: Predicting Expression Levels Using Codon Usage Bias

04/21/23 27Expression Prediction with CUB