phase bias in niv sabath university of houston overlappin enes g
DESCRIPTION
Overlap Length Count Long OverlapsShort Overlaps T G A T A A T A G 5’3’ Phase 1 Phase 2 Phase 0 Same-Strand OverlapsTRANSCRIPT
Phase Bias in
Niv SabathUniversity of Houston
Overlappinenes g
Introduction• Overlapping genes are ubiquitous, particularly in
bacteria and viruses
• Genes can overlap On the same strand (→ →) On opposite strands (→ ← or ← →)
• In bacteria~30% of the genes overlap~70% of the overlaps are on the same strand
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
1
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
2
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
4
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
5
7Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
8
10Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
Overlap Length
Count
Long Overlaps Short Overlaps
T G AT A AT A G
5’ 3’Phase 1Phase 2
Phase 0
Same-Strand Overlaps
11
Overlap Length
Count
Long Overlaps Short Overlaps
5’ 3’Phase 1Phase 2
Phase 0
T A G C A T G T A T C A T G G
“…the phase bias must be a property of gene locations.”
“We propose that through some mechanism yet to be determined, the creation of unidirectional gene overlaps of phase +1 confers some advantage.”(Cock and Whitworth 2007)
170 bacterial genomes15298 long overlaps in phase 1 4153 long overlaps in phase 2
T A G C A T G T A T C A T G G
170 bacterial genomes15298 long overlaps in phase 1 4153 long overlaps in phase 2
Composition?
T A G C A T G T A T C A T G G
Scenarios of overlap creation
5’ 3’
5’ 3’
5’ 3’
T A G C A T G T A T C A T G G
HypothesisThe phase bias in overlap frequency is a
result of difference between the frequencies of initiation/termination codons in phase 1 and phase 2 reading frames
T A G C A T G T A T C A T G G
We examined the frequencies of ATG, the most common start codon, and stop codons in phase 1 and phase 2 reading frames
T A G C A T G T A T C A T G G
T A G C A T G T A T C A T G G
T A G C A T G T A T C A T G G
Met
Met
T G T
T G C
T G A
T G G
Cys
Cys
Stop
Trp
T A T
C A T
A A T
G A T
Tyr
His
Asn
Asp
5’ 3’ Phase 0
Phase 1
Phase 2
T A G C A T G T A T C A T G G
What causes the difference in ATG frequencies?
What causes the difference in ATG frequencies?
Relative abundance of amino acids
T A G C A T G T A T C A T G G
T G T
T G C
T G A
T G G
Cys
Cys
Stop
Trp
T A T
C A T
A A T
G A T
Tyr
His
Asn
Asp
Phe Leu Ile MetVal Ser Pro ThrAla Gln Lys GluArg Gly
Phase 1 Phase 2
T A G C A T G T A T C A T G G
The frequencies of ATG are correlated with the expected frequencies
00GNNNAT ff
00TGNNNA ff
r2 = 0.93
r2 = 0.80
Amino-acid frequency
ATG frequency
Phase bias
T A G C A T G T A T C A T G G
Acknowledgments
Dr. Dan Graur
Dr. Giddy Landan
NSF
© Marko Posavec
Why was the correlation between overlap frequency and GC content unnoticed?
Long + short overlaps:
Phase 1 + phase 2
Phase 1
Phase 2 (64% ATGA)
Why was the correlation between overlap frequency and GC content unnoticed?
Long + short overlaps:
Phase 1 + phase 2
Phase 1
Phase 2 (64% ATGA)
Stop codon usage:
TGA TAA TAG
What is the proportion (P) of overlaps, which are created by each scenario after accounting for ATG frequencies?
frequencyoverlapphasefrequencyoverlapphaseRgenes 2
1
frequencyATGphasefrequencyATGphaseRATG 2
1
frequencycodonstopphasefrequencycodonstopphaseRstop 2
1
frequencyoverlapphasefrequencyoverlapphaseRgenes 2
1
frequencyATGphasefrequencyATGphaseRATG 2
1
frequencycodonstopphasefrequencycodonstopphaseRstop 2
1
stopATGgenes RPPRR 1
frequencyoverlapphasefrequencyoverlapphaseRgenes 2
1
frequencyATGphasefrequencyATGphaseRATG 2
1
frequencycodonstopphasefrequencycodonstopphaseRstop 2
1
stopATGgenes RPPRR 1
5.0114.67.3 PPP
The expected value under no bias for overlaps in the two scenarios