comparative genomics tools for biological discovery inna …sandrine/teaching/ph296.f02/sem/... ·...
TRANSCRIPT
![Page 1: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/1.jpg)
Comparative genomics tools for biological discovery
Inna Dubchak, Ph.D.Staff scientist
Lawrence Berkeley National Laboratory
![Page 2: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/2.jpg)
Outline
What is comparative genomics?
VISTA tools developed for comparative genomics.
Large scale VISTA applications including aligning whole genome assemblies
Related biological stories
![Page 3: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/3.jpg)
1-2% Coding
Gene A Gene B
Protein A Protein B
mRNA mRNA
Non-CodingCoding
![Page 4: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/4.jpg)
Distant Non-Coding Sequences Causing Disease
β γB εδ γA
β−Thalassemia~50kb
LCR
Disease Gene Distance
Campomelic displasiaAniridiaX-Linked DeafnessSaethre-Chotzen syndromeRieger syndromeSplit hand/split foot malformation
SOX9PAX6POU3F4TWISTPITX2SHFM1
850kb125kb900kb250kb90kb
450kb
![Page 5: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/5.jpg)
BackgroundEvolution can help!
In general, functionally important sequences are conserved
Conserved sequences are functionally important
Raw sequence can help in finding biological function
![Page 6: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/6.jpg)
Comparing sequences of different organisms
• Helps in gene predictions
• Helps in understanding evolution
• Conserved between species non-coding sequences are reliable guides to regulatory elements
• Differences between evolutionary closely related sequences help to discover gene functions
![Page 7: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/7.jpg)
ChallengesSequence at different stages of completion, difficult to compare
Whole genome shotgunFinished BACs
Fast and accurate analysisScaling up to the size of whole genomes
Partial Assemblies
![Page 8: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/8.jpg)
Annotate reference sequence
- Genic sequences - Repetitive elements - cpG islands
Identify evolutionarily related genomic sequences
Homologs - Orthologs - Paralogs
Align genomic sequences
- Global alignment program - Local alignment program
Identify conserved sequences
- Percent identity and length thresholds
Visualize conserved sequences
- Moving average point plot (VISTA) - Gap-free segment plot (PipMaker
![Page 9: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/9.jpg)
http://www-gsd.lbl.gov/vista
Processed ~ 16000 queries on-line, distributed > 700 copies of the program in 35 countries
![Page 10: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/10.jpg)
Modules of VISTA:
• Program for global alignment of DNA fragments of any length
• Visualization of alignment and various sequence features for any number of species
• Evaluation and retrieval of all regions with predefined levels of conservation
![Page 11: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/11.jpg)
Global Alignment
Local Alignment
Local vs global alignment
![Page 12: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/12.jpg)
Very fastVery fast global alignment of global alignment of megabasesmegabases of sequence.of sequence.
Provides detailsProvides details about ordered and oriented about ordered and oriented contigscontigs, and , and accurate placement in the finished sequence.accurate placement in the finished sequence.
Full integrationFull integration with repeat maskingwith repeat masking..
AVID- the alignment engine behind VISTA
• ORDER and ORIENT • FIND all common k-long words (k-mers)• ALIGN k-mers scoring by local homology• FIX k-mers with good local homology• RECURSE with smaller k (shorter words)
![Page 13: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/13.jpg)
Visualization
tggtaacattcaaattatg-----ttctcaaagtgagcatgaca-acttttttccatgg|| | |||| | | || || | | | |||||| | || | | ||tgatgacatctatttgctgtttcctttttagaaactgcatgagagcctggctagtaggg
Window of length L is centered at a particular nucleotide in the base sequence
Percent of identical nucleotides in L positions of the alignment is calculated and plotted
Move to the next nucleotide
![Page 14: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/14.jpg)
> 12877 289557 ST7b/a+ 13076 28251512877 13226159297 159379179096 179255189328 189382190026 190141191420 191495193659 193727 b only195616 195770197970 198067230397 230511248856 248928250369 250471269322 269472278619 278711281458 281597282396 283253 3'b289297 289557 3'a
Exons file
> Cow ST7 geneCTGAATGGCTCGTAGAAATAATGCATTCCCCTGCTGGACATGCTGAATAGCAATCGACTACAGT. . . .
Sequences in FastA format
> Human ST7 geneCTGAATGGCTCGTAGAAATATTGCATTAACCTGCTGGACATGCTGAATAGCAATCGACTACAGT. . . .
VISTA
Alignment files
VISTA file
Repeat Masker
185140 185150 185160 185170 185180GACATTGGAAAAGTAAAGGAAGTGGTTTAT---CTTGCTC------TTTTTGCAACAGTA
|||| |||||||| | |||||||||||||| | ||| | ||||| |||||GACACTGGAAAAGCAGAGGAAGTGGTTTATTGACCTGCCCCCCCCTTTTTTATAACAGTG
80078 (149626) to 80171 (149724) = 99bp at 63.6% noncoding159297 (158141) to 159379 (158223) = 83bp at 80.7% exon179096 (159067) to 179253 (159224) = 158bp at 75.9% exon189328 (159566) to 189382 (159620) = 55bp at 81.8% exon190026 (159996) to 190139 (160109) = 114bp at 80.7% exon191420 (160192) to 191495 (160267) = 76bp at 73.7% exon
Conservation files
![Page 15: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/15.jpg)
VISTA plot
Human Sequence (horizontal axis)
% Identity Between Humans/Mice(Vertical Axis)
KIF Gene
0kb0kb 10kb10kb
Conserved Non Coding Sequences
![Page 16: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/16.jpg)
Human/Human/MouseMouse
Nuclear Hormone Receptor:LXRNuclear Hormone Receptor:LXR--AlphaAlpha
Human/Human/RabbitRabbit
Human/Human/OpossumOpossum
![Page 17: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/17.jpg)
Human/Human/MouseMouse
Human/Human/MouseMouse
LowLow--Density Lipoprotein Receptor (LDLR)Density Lipoprotein Receptor (LDLR)
Human/Human/LemurLemur
Human/Human/LemurLemur
![Page 18: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/18.jpg)
Liver enhancer
human/mouse 75%
50/100%
human/rabbit
50/100%
75%
human/chicken 75%
50%
human/rat
50/100%
75%
75%human/pig
50/100%
100%
75%
50/100%
human/macaque
Apolipoprotein AI geneMulti-Species Comparative Analysis (VISTA)
![Page 19: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/19.jpg)
Mouse/Dog
Human/Dog
Human/Mouse
Mouse/Dog
Human/Mouse
Human/Dog
Mouse/Dog
Human/Mouse
Human/Dog
Example: Dubchak et al., 2000, Genome Research, 10: 1304-1306.
![Page 20: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/20.jpg)
Active conservation of noncoding sequences –present in more than two mammals
% Cutoffsum of three pair wise
Intersection/Union values is maximal
Over 120 basepairs:H/D > 92% H/M > 80% D/M > 77%
4
1 2
14
![Page 21: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/21.jpg)
VISTA flavors
• VISTA – comparing DNA of multiple organisms
• for 3 species - analyzing cutoffs to define actively conserved non-coding sequences
• cVISTA - comparing two closely related species
• rVISTA – regulatory VISTA
![Page 22: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/22.jpg)
Main features of VISTA
• Clear , configurable output
• Ability to visualize several global alignments on the same scale
• Alignments up to several megabases
• Working with finished and draft sequences
• Available source code and WEB site
![Page 23: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/23.jpg)
Godzilla - automatic computational system for comparative analysis of genomes
http://pipeline.lbl.gov http://www-gsd.lbl.gov/vista
DATABase Human Genome – Golden Path Assembly
Mouse assemblies:Arachne October 2001 Phusion November 2001 MGSC v3 April 2002
![Page 24: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/24.jpg)
Main modules of the system
Visualization Analysis of conservation
Mapping and alignment of mouse contigs against the human genome
![Page 25: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/25.jpg)
Linux cluster with 15 1.2GHz PC, 750Mb of RAM
Three days to align the entire mouse genome against the human genome
![Page 26: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/26.jpg)
H um an
M ouse
Chromosome Comparison
Base pair alignment247 GGTGAGGTCGAGGACCCTGCA CGGAGCTGTATGGAGGGCA AGAGC
|: || ||||: |||| --:|| ||| |::| |||---||||368 GAGTCGGGGGAGGGGGCTGCTGTTGGCTCTGGACAGCTTGCATTGAGAGG
![Page 27: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/27.jpg)
Tandem Local/Global Alignment ApproachSequence fragment anchoring (DNA and/or translated BLAT) Multi-step verification of potential regions using global alignment (AVID)
![Page 28: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/28.jpg)
Advantage of the tandem approach:
better sensitivity/specificity trade-offfill-in effectscoring longer alignments
AVIDLobalalignment
NT_002606 at Chr.17:2909457-29116113
BLATLocal alignment
![Page 29: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/29.jpg)
Alignment strategies for different types of assemblies.
Method Scheme of alignment
Examples
Contigs Individual contigs Finished BACs
Scaffold contigs can be reoriented and reordered
Arachne October 2001Phusion November 2001
Chopped pieces
mouse chromosomes are chopped in 250 kb and aligned to the Human Genome
Celera chromosome 16MGSC v3
![Page 30: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/30.jpg)
Visualization – VistaBrowser & VistaTrack
Comparison combined with the human genome annotation on the UCSC Human Genome Browser
Stand-alone Java applet for detailed comparison
![Page 31: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/31.jpg)
VistaBrowser
![Page 32: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/32.jpg)
VistaTrack
Vista Plots
HumanSequence
Annotations
![Page 33: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/33.jpg)
http://pipeline.lbl.gov/
MyGodzilla - is an interactive web tool forcomparing your favorite sequence againstthe human genome
![Page 34: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/34.jpg)
MyGodzilla Tool
Submit a DNA sequence of ANY organism...
![Page 35: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/35.jpg)
Query against the human genome assembly- June 2002
![Page 36: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/36.jpg)
Query against the mouse genome assembly – Feb. 2002
![Page 37: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/37.jpg)
Examples of Results
• Understanding the structure of conservation
• Identification of putative functional sites
• Discovery of new genes
• Detection of contamination and misassemblies
![Page 38: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/38.jpg)
O N E
Wayward Discovery of a New Apolipoprotein Gene
T W O
Interleukin Expression Switch
Biological stories
![Page 39: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/39.jpg)
Zoom InZoom In
Gene Name
Identification of a New Identification of a New ApoApo Gene on Human 11q23Gene on Human 11q23Godzilla Godzilla
Highly Conserved RegionHighly Conserved Region
ApoA4ApoA4 ApoC3ApoC3 ApoA1ApoA1
![Page 40: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/40.jpg)
Identification of a New Identification of a New ApoApo Gene on Human 11q23Gene on Human 11q23Godzilla Godzilla
New Gene (ApoA5)New Gene (ApoA5)PennacchioPennacchio LA et al.LA et al.Science. 2001, 294:169Science. 2001, 294:169--7373..
![Page 41: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/41.jpg)
human/mouse
Human/Mouse Apolipoprotein Gene Cluster Sequence Comparison
0kb 20kb
20kb 40kb
40kb 60kb
ApoAIV
ApoCIII ApoAI
????
LiverEnhancer
![Page 42: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/42.jpg)
Predicted Protein Sequence Has Homology to ApoAIV
25kb0kb
Apo AIV Apo AIApo CIII
-35kb
ZNF259
-10kb
“Apo AV”
predicted proteinhuman apoAIV
Identity: 26%Similarity: 45%
---MAAVLTWALALLS----AFSATQARKGFWDYFSQTSG-DKGRVEQIHMFLKAVVLTLALVAVAGARAEVSADQVATVMWDYFSQLSNNAKEAVEHLQ
QQKMAREP-ATLKDSLEQDLNNMNKFLEKLRPLSGSEAPRLPQDPVGMRRKSELTQQLNALFQDKLGEVNTYAGDLQKKLVPFATELHERLAKDSEKLKE
QLQEELEEVKARLQPYMAEAHELVGWNLEGLRQQLKPYTMDLMEQVALRVEIGKELEELRARLLPHANEVSQKIGDNLRELQQRLEPYADQLRTQVNTQA
QELQEQLRVVGEDTKAQLLGGVDEAWALLQG----LQSRVVHHTGRFKELEQLRRQLDPLAQRMERVLRENADSLQASLRPHADELKAKIDQNVEELKGR
FHPYAESLVSGIGRHVQELHRSVAPHAPASPARLSRCVQVLSRKLTLKAKLTPYADEFKVKIDQTVEELRRSLAPYAQDTQEKLNHQLEGLTFQMKKNAE
ALHARIQQNLDQLREELSRAFAGT-----GTEEGAGPDPQMLSEEVRQRLELKARISASAEELRQRLAPLAEDVRGNLKGNTEGLQKSLAELGGHLDQQV
QAFRQDTYLQIAAFTRAIDQETEEVQQQLAPPPPGHSAFAPEFQQTDSGKEEFRRRVEPYGENFNKALVQQMEQLRQKLGPHAGDVEGHLSFLEKDLRDK
VLSKLQARLDDLWEDITHSLHDQGHSHLGDP---------------VNSFFSTFKEKESQDKTLSLPELEQQQEQQQEQQQEQVQMLAPLES
![Page 43: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/43.jpg)
Summary: Summary: ApoAVApoAV
•• A new A new apolipoproteinapolipoprotein belonging to the belonging to the ApoAIApoAI/CIII/AIV /CIII/AIV gene cluster.gene cluster.
•• Expressed in the liver & associates with HDL/VLDL.Expressed in the liver & associates with HDL/VLDL.
•• An important modulator of An important modulator of triglyceridestriglycerides (TG) in mice.(TG) in mice.
ApoAVApoAV TGTG ApoAVApoAV TGTG
Is Is ApoAVApoAV involved in human biology/disease?involved in human biology/disease?
![Page 44: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/44.jpg)
KOKO
TransTrans
Mouse studiesMouse studies
ApoAIV ApoCIII ApoAI
ApoAV
Importance of Importance of ApoAVApoAVon on TriglycerideTriglyceride MetabolismMetabolism
Human studiesHuman studies
![Page 45: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/45.jpg)
CodingExons 3%
HS-
5910
8 2
IL Cluster MU Ch 11
LA
Cs2
FMR
2 -h
omol
og
Cyc
lin I
-hom
olog
IL-3
Ubi
quito
ne-B
PG
DF-
9H
s.151
472
Hs.1
3308
Sept
in2
KIF
3
IL-4
IL-1
3R
AD
-50
IL-5
IRF-
1H
s.709
32
OC
TN
1
P4-h
ydro
xyla
seal
pha
OC
TN
2
GM
-CSF
IL Cluster HUM 5q31
IL-3
FMR
2 -h
omol
ogU
biqu
itone
-BP
GD
F-9
Hs.1
3308
Sept
in2
KIF
3
IL-4
IL-1
3
RA
D-5
0
IL-5
IRF-
1
OC
TN
1
P4-h
ydro
xyla
seal
pha
OC
TN
2
GM
-CSF
Cyc
lin I
-hom
olog
LA
Cs2
HS-
5108
2
Conserved 2.6% (>100bp > 75%)
Non-Coding
![Page 46: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/46.jpg)
AFilteringStrategy
![Page 47: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/47.jpg)
Present in other species: Cow (86%), Dog (81%), Rabbit (73%)
Genomic position conserved in human, mouse, dog, baboon
Single copy in the human genome. Two hypersensitive sites mapped.
CNS-1
![Page 48: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/48.jpg)
Functional Analysis of CNS1
KIF3 IL4 IL13 RAD50 IL5 IRF1 E3 E2 OCTN2CNS 1
LoxP CNS1 LoxP
Generate Human 5q31 YAC Transgenic Mice
IL 4 IL 13
![Page 49: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/49.jpg)
020406080
100120140160180200
3 day 5 day
CNS-1 wtCNS-1 del
Human IL 4 Production in YAC Transgenics Containing and Lacking CNS1
IL-5 & IL13 Expression is also reduced in CNS-1del micePg
/ml
![Page 50: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/50.jpg)
KIF3
IL4
IL13
RAD50CNS-1
IL5
Science. 2001 Oct 5;294(5540):169-73.
![Page 51: Comparative genomics tools for biological discovery Inna …sandrine/Teaching/PH296.F02/Sem/... · Comparative genomics tools for biological discovery Inna Dubchak, Ph.D. Staff scientist](https://reader030.vdocuments.net/reader030/viewer/2022020302/5abc82d07f8b9ad1768e0a94/html5/thumbnails/51.jpg)
ThanksBiology BioinformaticsKelly Frazer Michael BrudnoGaby Loots Olivier Couronne Len Pennacchio Brian Klock
Chris MayorIvan OvcharenkoAlexander PoliakovJody Schwartz
Eddy Rubin Lior Pachter (UCB)
Funding – Programs for Genomic Applications (PGA) by NHLBI