introduction to genomics cm226: machine learning for...
TRANSCRIPT
![Page 1: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/1.jpg)
Introduction to genomics
CM226: Machine Learning for Bioinformatics
Fall 2018 Sriram Sankararaman
1
![Page 2: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/2.jpg)
What is this course about? Bioinformatics: Answering biological questions using tools from computer science, statistics and mathematics.
Machine Learning: Learning from data
2
![Page 3: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/3.jpg)
Personalized medicine
A biological question
3
![Page 4: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/4.jpg)
Cystic fibrosis
Caused by mutations in the CFTR gene
Ivacaftor is a drug approved for patients with a mutation the protein encoded by CFTR
4
![Page 5: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/5.jpg)
Cancer relapse
Acute Lymphoblastic Leukemia in children
Higher risk of relapse among children with native American ancestry
Yang et al. Nature Genetics 2011
Risk
of r
elap
se
5
![Page 6: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/6.jpg)
Path to personalized medicine
Basic biology
What are the genetic and environmental factors?
Disease prediction
Predict disease risk for an individual from genetic data
6
![Page 7: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/7.jpg)
Path to personalized medicine
Basic biology
What are the genetic and environmental factors?
Disease prediction
Predict disease risk for an individual from genetic data
Problems in machine learning
7
![Page 8: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/8.jpg)
What is this course about? Bioinformatics: Answering biological questions using tools from computer science, statistics and mathematics.
Machine Learning: Learning from data
8
![Page 9: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/9.jpg)
What is this course about?Genomic revolution in biology
2001 Human genome project
2010 1000 genomes project
9
![Page 10: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/10.jpg)
What is this course about?Genomic revolution in biology
Stephens et al. PLoS Biology 2015
Num
ber o
f hum
an g
enom
es
Num
ber o
f bas
es
10
![Page 11: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/11.jpg)
What is this course about?Personal genomics
11
![Page 12: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/12.jpg)
Statistics and computing important in obvious and subtle ways
What does this mean for us?
12
![Page 13: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/13.jpg)
Administrivia
13
![Page 14: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/14.jpg)
Course goalsIdentify biological questions.
Translate these questions into a statistical model.
Perform inference within the model.
Apply the model to data.
Interpret the results
Statistically sound ? Biologically meaningful ?
14
![Page 15: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/15.jpg)
Course goals
For CS/Stats/EE students, identify important quantitative problems in Bioinformatics.
For Bioinformatics/Human Genetics students, learn a new set of tools and understand their strengths and limitations.
15
![Page 16: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/16.jpg)
Enrolment/PTEsCourse over-subscribed.
Students on the waitlist will be enrolled in case some one drops.
I am not giving out PTEs until after problem set 0 submission date.
Expect several students will drop the course.
Last year, several students dropped late in the quarter.
Does require mathematical maturity (more later).
16
![Page 17: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/17.jpg)
Enrolment/PTEsHope that all qualified students will be able to enroll.
Problem set 0 intended to mitigate late drops.
This is a course requirement.
Graded to assess your background but not part of final grade.
Be honest/realistic with yourself about your background.
17
![Page 18: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/18.jpg)
Prerequisites
Some programming experience (R strongly encouraged)
18
![Page 19: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/19.jpg)
PrerequisitesMachine learning builds upon
Probability and Statistics
Linear algebra
Optimization
19
![Page 20: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/20.jpg)
Math background reviewProblem set 0 intended to evaluate if you have the necessary background.
If you can solve the problems with little difficulty, you are in good shape to take this class.
If you have trouble with a few questions, you should expect to fill in the background needed ( we will also cover this material in class).
If you find most of the questions difficult, you should consider other classes to obtain the necessary background.
20
![Page 21: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/21.jpg)
Course formatIntroduce important problems in Bioinformatics.
Use these problems to motivate the study of learning algorithms
Will study different aspects of learning algorithms
Statistical
Computational
Issues that arise in practice
Lectures will switch between studying the methodology and the applications
21
![Page 22: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/22.jpg)
Course formatProblem sets (worth 50% total).
Will include programming and data analyses.
Homeworks are due at 11:59pm on due date.
Late submissions not accepted.
Will use gradescope to manage submissions and grading. Code for the class:MV527B
All solutions must be clearly written or typed. Unreadable answers will not be graded. We encourage using LaTeX to typset answers.
For some assignments, it might be the case that only a subset of the problems will be graded in detail. Students may or may not be told which problems will be graded in advance.
Solutions will be graded on both correctness and clarity.
You are free to discuss homework problems. However, you must write up your own solutions. You must also acknowledge all collaborators.
22
![Page 23: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/23.jpg)
Course formatExams
In-class midterm (20%) Date: November 5
In-class final (30%) Date: December 5
Exams are in class, closed-book and closed-notes and will cover material from the lectures and the problem sets. You will be allowed to carry a one-page cheat sheet.
No alternate or make-up exams will be administered, except for disability/medical reasons documented and communicated to the instructor prior to the exam date. In particular, exam dates and times cannot be changed to accommodate scheduling conflicts with other classes.
Problem set 0 will be graded but not count towards final grade. We will not grade any other homeworks/exams unless you submit problem set 0.
23
![Page 24: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/24.jpg)
ForumsPiazza
Must have already got an email.
Otherwise sign up at: https://piazza.com/ucla/fall2018/cm226/home
Strongly encourage students to post here rather than email course staff directly (you will get a faster response this way)
If you do need to contact the staff privately, Piazza allows you to do this.
If you want to contact us over email, please include CM226 in the subject line.
24
![Page 25: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/25.jpg)
ForumsGradescope
Manage problem sets and exam submissions.
For problem sets, you will need to upload pdfs of your submission to gradescope.
Regrade requests must be made through gradescope within one week after the graded homeworks have been handed out, regardless of your attendance on that day and regardless of any intervening holidays such as Memorial Day.
We reserve the right to regrade all problems for a given regrade request.
25
![Page 26: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/26.jpg)
Tentative class syllabusSee class website:
http://web.cs.ucla.edu/~sriram/courses/cm226.fall-2018/html/index.html
26
![Page 27: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/27.jpg)
Office hoursMy Office hours: Tuesday 1-2 pm or by appointment@Engineering VI, 296B
TA: Tevfik Dincer Office hours: Thursday 2-4pm, Location TBD
27
![Page 28: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/28.jpg)
Questions?
28
![Page 29: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/29.jpg)
Goals for this class and the next
Introduction to genomics
Introduction to statistics
29
![Page 30: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/30.jpg)
Crash-course in genomicsHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
30
![Page 31: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/31.jpg)
OutlineHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
31
![Page 32: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/32.jpg)
Trait/Phenotype
Trait/phenotype: Any observable that is inherited
Height, eye color, disease status, cellular measurements, IQ
Instructions that modulate traits found in the genome
Galton 1886
32
![Page 33: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/33.jpg)
How the genome codes for function?
33
![Page 34: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/34.jpg)
OutlineHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
34
![Page 35: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/35.jpg)
Typical human cell has 46 chromosomes
22 pairs
1 pair of sex chromosomes
Genetics and inheritance
The chromosome painting collective35
![Page 36: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/36.jpg)
Genetics and inheritanceOne member of each pair comes from the father (paternal) and the other from the mother (maternal)
The chromosome painting collective36
![Page 37: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/37.jpg)
OutlineHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
37
![Page 38: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/38.jpg)
Causes of genetic variation
DNA not always inherited accurately
Mutations: changes in DNA
Changes at a single base (nucleotide)
Can have more complex changes
38
![Page 39: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/39.jpg)
More definitions
SNP: single nucleotide change across individuals
Allele: set of bases at a SNP
Genotype: sequence of alleles along the SNPs of an individual
Genotype 1: (1,CT),(2,GG) Genotype 2: (1,TT), (2,GA)
39
A T C C T T A G G A
A T C T T T C A G A
SNP 1 SNP 2
A T C T T T C A G A
A T C T T T C A A A
Individual 1
Individual 2
Maternal
Paternal
Base/Nucleotide
![Page 40: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/40.jpg)
Single Nucleotide Polymorphism (SNP)
Form the basis of most genetic analyses
Easy to study in high-throughput (million at a time)
Common
In humans, a SNP every ~1000 bases
A T C C T T A G G A
A T C T T T C A G A
SNP 1 SNP 2
A T C T T T C A G A
A T C T T T C A A A
Individual 1
Individual 2
Maternal
Paternal
40
![Page 41: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/41.jpg)
Single Nucleotide Polymorphism (SNP)
0 0 1 0
SNP 1 SNP 2
1 01 1
Genotype 1
Genotype 2
Maternal
Paternal
41
0=C 1=T
0=G 1=A
![Page 42: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/42.jpg)
Genetic inheritanceMendel’s first law
11 00
1
1 0
0
10
42
P(Transmitting allele 1)=1 P(1)=0
P(1)=0.5
Father Mother
Child
![Page 43: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/43.jpg)
Mendel’s first law
11 X 00
10
11 X 10
1011
10 X 10
1011 01
0.5 0.5
0.25 0.250.25
1.0
Genetic inheritance
43
Father Mother
Child
00
0.25
Father Mother
Father Mother
![Page 44: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/44.jpg)
Mendel’s second law
01 1011
? ? ? ?
SNP 1 SNP 2
Genetic inheritance
44
00
0 01 1
GenotypeMaternal
Paternal
What are the probabilities of each combination ?
![Page 45: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/45.jpg)
Mendel’s second law
01 1011
0.50 0.50 0.00 0.00
SNP 1 SNP 2
Genetic inheritance
45
00
0 01 1
GenotypeMaternal
Paternal
![Page 46: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/46.jpg)
Mendel’s second law
01 1011
0.25 0.25 0.25 0.25
SNP 1 SNP 2
Genetic inheritance
46
00
0 01 1
GenotypeMaternal
Paternal
![Page 47: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/47.jpg)
Mendel’s second law
Not quite
01 1011
0.45 0.45 0.05 0.05
SNP 1 SNP 2
Genetic inheritance
47
00
0 01 1
GenotypeMaternal
Paternal
![Page 48: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/48.jpg)
Mendel’s second law
Not quite. Recombination
Genetic inheritance
48
0 0 0 0
0 0 001
1 1
1
111
1
![Page 49: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/49.jpg)
Back to genetic inheritanceMendel’s second law
Positions nearby inherited together.
The chromosome painting collective49
![Page 50: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/50.jpg)
Back to genetic inheritanceMutation and recombination produce genetic variation
Mutation produces differences
Recombination shuffles these differences
50
![Page 51: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/51.jpg)
Genetic inheritanceProbabilistic model
Can write probability P(Genotype of child | Genotype of parents)
Can use this to write more complex models
P (Genotype of individual | Genotype of sibling)
P( Genotype of individual | Genotype of grandparent)
51
![Page 52: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/52.jpg)
OutlineHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
52
![Page 53: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/53.jpg)
What can we learn from genetic variation ?
Evolution and history
Biological function and disease
53
![Page 54: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/54.jpg)
What can we learn from genetic variation ?
54
How people are related ?
![Page 55: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/55.jpg)
What can we learn from genetic variation ?
History of human populations
https://blog.23andme.com/23andme-and-you/genetics-101/a-beautiful-ancestry-painting/
55
![Page 56: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/56.jpg)
Genome-wide Association Studies (GWAS)
56
![Page 57: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/57.jpg)
OutlineHow does the genome code for function?
How is the genome passed on from parent to child ?
How does the genome change when it is passed on ?
What can we learn from genetic variation ?
How do we read the genome ?
57
![Page 58: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/58.jpg)
Reading our genomesGoal: Determine the sequence of bases along each chromosome
Many technologies
Break up the genome into smaller fragments
Read each fragment
Put back together
Computationally hard
58
![Page 59: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/59.jpg)
The human genome projectGoals
Sequence an accurate reference human genome
Draft published in 2001
High-quality version completed in 2003
Cost: ~$3 billion.
Time: ~13 years.
Other outcome: Power of computing
59
![Page 60: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/60.jpg)
Reading the genomeHuman genome provides a reference
Humans share most of their genome (~99.9% )
Can focus on reading the differences
Simpler problem
60
![Page 61: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/61.jpg)
Reading the genome
High-throughput genotyping array
Idea: DNA molecules bind (hybridize)
Nucleotides bind to their complementary bases
A and T hybridize, C and G hybridize
Can be used to get the genotype at a chosen set of SNPs
T A A G T A C G G A
T A A T T A C A G A
61
![Page 62: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/62.jpg)
Maps of genetic variation
Goals: Describe common patterns of genetic variation in human populations
Phase 1: Genotyped ~1 million SNPs from 270 individuals in 4 populations. Captured most SNPs with a frequency of >5%.
Phase 3: 7 additional populations included
All data publicly available.
International HapMap Project
62
![Page 63: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/63.jpg)
Reading the genomeLimitations of genotyping
Can only read SNPs that are on the array
Biased by how these SNPs are chosen
63
![Page 64: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/64.jpg)
Reading the genomeTechnologies: Illumina, IonTorrent, PacBio
Can read only small pieces of the genome (~100 bases)
But can read hundreds of thousands of pieces in parallel
Use the reference human genome to find the locations of the pieces (and to infer mutations)
High-throughput (or next-gen) sequencing
64
![Page 65: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/65.jpg)
Reading the genomeCost of genome sequencing
65
![Page 66: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/66.jpg)
Maps of genetic variation
2500 individuals from 26 populations
Discover ~90 million SNPs
Includes >99% of SNPs with frequency >1%
All data publicly available
1000 Genomes Project
66
![Page 67: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/67.jpg)
Maps of genetic variation
Many more such efforts underway
Example: Simons Genome Diversity Project : 260 genomes from 127 populations
Also publicly available
67
![Page 68: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/68.jpg)
Other interesting data
EXAC data: Exomes from ~60,000 individuals
Also publicly available
UK Biobank: 500,000 individuals with 200 phenotypes
Not publicly available
68
![Page 69: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/69.jpg)
Computational and statistical problems
Recurring theme:
How to model a biological problem?
69
![Page 70: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/70.jpg)
Recurring theme:
How to scale up algorithms to massive datasets?
Computational and statistical problems
70
![Page 71: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/71.jpg)
Recurring theme:
Tradeoffs
Statistical accuracy
Computational efficiency
Constraints from technology
Constraints such as privacy
Computational and statistical problems
71
![Page 72: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/72.jpg)
Learning the mapping from genotype to phenotype Association tests
Correcting for confounding
Multiple hypothesis testing
Risk prediction
Causality
Population genomics Inference of human history
Admixture inference
Computational and statistical problems
72
![Page 73: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/73.jpg)
Regression Linear, Logistic, Ridge
Latent variable models Clustering, Mixture models
PCA
Admixture models
Directed Graphical Models Hidden Markov Models Kernels Deep Learning
Computational and statistical problems
73
![Page 74: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/74.jpg)
This classOverview of the biological questions Basic concepts in genetics Genomes are inherited according to well-known probabilistic models
Genomes change
Genetic variation forms the starting point for inference.
Possible inferences: history, disease risk
Advances in technology allow us to sequence large numbers of genomes
74
![Page 75: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/75.jpg)
Next classBasic concepts in statistics
Given data, how do we perform inference ?
What are the types of inference?
How do we derive procedures to perform inference?
How do we evaluate these procedures ?
75
![Page 76: Introduction to genomics CM226: Machine Learning for ...sriram/courses/cm226.fall-2018/slides/intro… · Introduction to genomics CM226: Machine Learning for Bioinformatics Fall](https://reader030.vdocuments.net/reader030/viewer/2022041016/5ec77ccccc27ff5ffb5ef3e5/html5/thumbnails/76.jpg)
Thank you !
76