cs598 machine learning in computational biology (lecture 1
TRANSCRIPT
![Page 1: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/1.jpg)
CS598 Machine Learning in
Computational Biology (Lecture 1: Introduction)
Professor Jian Peng Teaching Assistant: Rongda Zhu
![Page 2: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/2.jpg)
IntroductionInstructor:
• Jian Peng My office location: 2118 SC Office hour: Thursday, 3:15pm-4:45pm Email: [email protected]
• My own research: Computational Biology and Graphical Models
Teaching Assistant: • Rongda Zhu, PhD student ([email protected]) (Department of Computer Science)
• Rongda’s research: Machine Learning and Probabilistic Inference
Course website: http://web.engr.illinois.edu/~jianpeng/teaching/CS598_Fall15/index.htm
![Page 3: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/3.jpg)
Course Information
Schedule (tentative)
• Introductory lectures (Aug 25 to Sep 8) • Biology data analysis • Probabilisitic models
• Student presentations (Sep 8 to Dec 3)
• Research survey • Research article
• Course projects
• Proposal presentation (Oct 6 & 8) • Final presentation (Dec 8 &10)
![Page 4: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/4.jpg)
ObjectivesIntroduction to computational biology
• Important problems in computational biology • Machine learning techniques for data analysis • Understand how methods work
Learning to do research
• Paper presentation • Ability to present key ideas to other people • Ability to ask critical questions
• Course project experience • Hands-on practice with real datasets • Propose and perform independent research • Active participation in the field
![Page 5: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/5.jpg)
Prerequisites
Biology:
• Basic concepts in molecular biology • Reference:
Molecular Biology for Computer Scientists by Lawrence Hunter
Machine Learning:
• Probability and statistics • Optimization • Textbook:
Pattern Recognition and Machine Learning by Christopher Bishop
![Page 6: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/6.jpg)
Grading
• Class attendance: 10%
• Presentation: 30%
• Course Project: 60% • Proposal • Report • Presentation
![Page 7: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/7.jpg)
Presentation
• Discuss papers you would like to present with me at least one week before your presentation
• Research survey (at least five papers) • Methodology: applications to different problems • Research problem: the state-of-the-art methods
• Research article (preferred) • Background: what is the problem? why important? • Methodology: how does it work? • Results: what are the findings? any conclusions?
• Open-ended Q & A and debate
![Page 8: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/8.jpg)
Questions about the presentation?
![Page 9: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/9.jpg)
Course Project
Computational techniques • Novel machine learning algorithms • Efficient algorithms that scale on large datasets • New probabilistic models for biological data
Biological problems • New biological findings • Improvements over existing method • New computational biological problems
The goal is to have something publishable or presentable in conferences or journals.
![Page 10: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/10.jpg)
Course Project
• Proposal presentation (Oct 6 & 8) • written proposal due by Oct 4 • at least four pages • discuss with me about your projects in Sep • 15-min presentation in class • I will also give you a list of potential projects
if you don’t have one by Sep 20.
• Final presentation (Dec 8 &10) • Report due by Dec 12 • at least eight pages • 15-min oral presentation and poster
![Page 11: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/11.jpg)
Course Project
• Team size • one or two • make clear your contribution in the project report
• Implementation • put your code/data on github • get your hands dirty and work on real-world datasets • your contribution should be original
![Page 12: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/12.jpg)
Questions about the course project?
![Page 13: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/13.jpg)
Introduce yourself
![Page 14: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/14.jpg)
Why computational biology is hard?
• High-dimensional
• Noisy
• Huge
• Sparse
![Page 15: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/15.jpg)
Biological Data
Sequence data
• Protein/DNA sequence • Generative and discriminative models for sequences • Deep learning
Matrix data
• Gene expression • Dimensionality reduction and feature selection • Low-rank approximation
![Page 16: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/16.jpg)
Biological Data
Network data
• Molecular network • Random walk algorithms • Graphical models and approximate inference
Heterogeneous data
• Dimensionality reduction • Probabilistic models for data integration • Network-based data integration
![Page 17: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/17.jpg)
Machine Learning
Supervised learning • Prediction:
• classification: SVM, logistic regression, random forest • structured output: CRF, structured SVM
• Feature finding: • Sparse learning: LASSO and elastic nets
Unsupervised learning • Dimensionality reduction and embedding:
• manifold learning: Isomap, LLE, t-SNE • component analysis: PCA, ICA
• Probabilistic modeling: • graphical model: HMM, Bayesian networks, RBM • methodology: variational inference, sampling
![Page 18: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/18.jpg)
Please read “Molecular Biology for Computer Scientists” by Lawrence Hunter
TODO after this class
![Page 19: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/19.jpg)
Examples of my research projects
![Page 20: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/20.jpg)
Protein sequence, structure and function
ACDEEEFGHIKL----MPQRSTVWY ACDE--FGHIKLRMQP----STVWY
sequence
structure function
![Page 21: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/21.jpg)
Network analysis for disease modeling
human disease network
network analysis
new disease biology (potential drug targets)
![Page 22: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/22.jpg)
Pharmacogenomics and cancer genomics
Figure from the DREAM challenge website
![Page 23: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/23.jpg)
Integration of heterogeneous data
![Page 24: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/24.jpg)
“Search” engine for drug discovery
Drug Protein
DiseaseSideeffect
perturbationassociation
association association
Pathway
membership
Cell type
on/off
Mutation
association
interaction
![Page 25: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/25.jpg)
Diffusion Component Analysis
Network embedding
![Page 26: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/26.jpg)
Variational inference
![Page 27: CS598 Machine Learning in Computational Biology (Lecture 1](https://reader030.vdocuments.net/reader030/viewer/2022020203/58a2d2b01a28ab4a0f8c3c89/html5/thumbnails/27.jpg)
• Discriminance sampling for partition function estimation
• Combining variational inference and sampling approaches
Restricted Boltzmann Machine Deep Boltzmann Machine
Sampling Classification
Approximate inference