machine learning in biomedical informatics
Post on 13-Jan-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Machine Learning in BioMedical Informatics
SCE 5095: Special Topics Course
Instructor: Jinbo Bi
Computer Science and Engineering Dept.
2
Course Information
Instructor: Dr. Jinbo Bi – Office: ITEB 233– Phone: 860-486-1458
– Email: jinbo@engr.uconn.edu
– Web: http://www.engr.uconn.edu/~jinbo/– Time: Mon / Wed. 2:00pm – 3:15pm – Location: CAST 204– Office hours: Mon. 3:30-4:30pm
HuskyCT– http://learn.uconn.edu– Login with your NetID and password
– Illustration
3
Introduction of the instructor
Ph.D in Mathematics Previous work experience:
– Siemens Medical Solutions Inc.
– Department of Defense, Bioanalysis
– Massachusetts General Hospital Research Interests
subtyping GWAS
Color of flowers
Cancer, Psychiatri
c disorde
rs, …
http://labhealthinfo.uconn.edu/EasyBreathing
4
Course Information
Prerequisite: Basics of linear algebra, calculus, and basics of programming
Course textbook (not required):
– Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar
– Pattern Recognition and Machine Learning (2006) Christopher M. Bishop
– Pattern Classification (2nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork
Additional class notes and copied materials will be given Reading material links will be provided
5
Objectives:
– Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art literature in data mining/machine learning
– Get to know some general topics in medical informatics
– Focus on some high-demanding medical informatics problems with hands-on experience of applying data mining techniques
Format:
– Lectures, Labs, Paper reviews, A term project
Course Information
6
Survey
Why are you taking this course? What would you like to gain from this course? What topics are you most interested in learning
about from this course? Any other suggestions?
(Please respond before NEXT THUR. You can also Login HuskyCT and download the MS word file, fill in, and shoot me an email.)
7
Grading
In-Class Lab Assignments (3): 30% Paper review (1): 10% Term Project (1): 50% Participation (1): 10%
8
Policy
Computers Assignments must be submitted electronically via
HuskyCT Make-up policy
– If a lab assignment or a paper review assignment is missed, there will be a final take-home exam to make up
– If two of these assignments are missed, an additional lab assignment and a final take-home exam will be used to make up.
9
Three In-class Lab Assignments
At the class where in-class lab assignment is given, the class meeting will take place in a computer lab, and no lecture
Computer lab will be at ITEB 138 (TA reserve) The assignment is due at the beginning of the
class one week after the assignment is given If the assignment is handed in one-two days late,
10 credits will be reduced for each additional day Assignments will be graded by our teaching
assistant
10
Paper review
Topics of papers for review will be discussed Each student selects 1 paper in each
assignment, prepares slides and presents the paper in 8 – 15 mins in the class
The goal is to take a look at the state-of-the-art research work in the related field
Paper review assignment is on topics of state-of-the-art data mining techniques
11
Term Project
Possible project topics will be provided as links, students are encouraged to propose their own
Teams of 1-2 students can be created Each team needs to give a presentation in the
last 1-2 weeks of the class (10-15min) Each team needs to submit a project report
– Definition of the problem– Data mining approaches used to solve the
problem– Computational results– Conclusion (success or failure)
12
Final Exam
If you need make-up final exam, the exam will be provided on May. 1st (Wed)
Take-home exam Due on May 9th (Thur.)
13
Three In-class Lab Assignments
BioMedical Informatics Topics– So many– Cardiac Ultrasound image categorization– Computerized decision support for Trauma
Patient Care– Computer assisted diagnostic coding
14
Cardiac ultrasound view separation
15
Cardiac ultrasound view separation
Classification (or clustering)
Apical 4 chamber view
Parasternal long axis view
Parasternal short axis view
16
25 min of transport time/patient
High-frequency vital-sign waveforms (3 waveforms)– ECG, SpO2, Respiratory
Low-frequency vital-sign time series (9 variables)Derived variables
– ECG heart rate– SpO2 heart rate– SaO2 arterial O2
saturation– Respiratory rate
Discrete patient attribute data (100 variables)– Demographics, injury description, prehospital
interventions, etc.
Measured variables► NIBP (systolic, diastolic,
MAP)► NIBP heart rate► End tidal CO2
Vital signs used in decision-support algorithms
HRRRSaO2SBPDBPPropaq
Trauma Patient Care
17
Trauma Patient Care
18
Heart Rate
Respiratory Rate
Saturation of Oxygen
BloodPressure
MajorBleeding
Make a prediction
Trauma Patient Care
19
Patients – Criteria
Patient
1
428
diagnosis
250
AMI
2 414
3
250
429
SCIP
...
... ...
... ...
heart failure
diabetes
Code database
Look up ICD-9 codes
Patient – Notes
Patient
1
A
Note
B
C
D
E
2
F
G
...
... ...
... ...
Hospital Document DB Diagnostic Code DB
Statistics
reimbursement
Insurance
19SIEMENS /38
Diagnostic coding
20
Patients – Criteria
Patient
1
428
diagnosis
250
AMI
2 414
3
250
429
SCIP
...
... ...
... ...
heart failure
diabetes
Code database
Look up ICD-9 codes
Patient – Notes
Patient
1
A
Note
B
C
D
E
2
F
G
...
... ...
... ...
Hospital Document DB Diagnostic Code DB
Statistics
reimbursement
Insurance
RWP/CC1 DICT. XXXXXXXXXXX P TRANS. XXXXXXXXXX P DOC.# 1554360 JOB # XXXXXXXXXX CC XXXXXXXXXX FILE CV XXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXX ORDXXXXXXX, XXXX L ADM DIAGNOSIS: BRADYCARDIA ANEMIA CHF ORD #: XXXXXXX DX XXXXXXX 14:10 PROCEDURE: CHEST - PA ` LATERAL ACCXXXXXX REPORT: CLINICAL HISTORY: CHEST PAIN. CHF. AP ERECT AND LATERAL VIEWS OF THE CHEST WERE OBTAINED. THERE ARE NO PRIOR STUDIES AVAILABLE FOR COMPARISON. THE TRACHEA IS NORMAL IN POSITION. HEART IS MODERATELY ENLARGED. HEMIDIAPHRAGMS ARE SMOOTH. THERE ARE SMALL BILATERAL PLEURAL EFFUSIONS. THERE IS ENGORGEMENT OF THE PULMONARY VASCULARITY. IMPRESSION: 1. CONGESTIVE HEART FAILURE WITH CARDIOMEGALY AND SMALL BILATERAL PLEURAL EFFUSIONS. 2. INCREASING OPACITY AT THE LEFT LUNG BASE LIKELY REPRESENTING PASSIVE ATELECTASIS.
…. …………………. ……………. ……….
20SIEMENS /38
Diagnostic coding
21
Patients – Criteria
Patient
1
428
diagnosis
250
AMI
2 414
3
250
429
SCIP
...
... ...
... ...
heart failure
diabetes
Code database
Look up ICD-9 codes
Patient – Notes
Patient
1
A
Note
B
C
D
E
2
F
G
...
... ...
... ...
Hospital Document DB Diagnostic Code DB
Statistics
reimbursement
Insurance
RWP/CC1 DICT. XXXXXXXXXXX P TRANS. XXXXXXXXXX P DOC.# 1554360 JOB # XXXXXXXXXX CC XXXXXXXXXX FILE CV XXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXX ORDXXXXXXX, XXXX L ADM DIAGNOSIS: BRADYCARDIA ANEMIA CHF ORD #: XXXXXXX DX XXXXXXX 14:10 PROCEDURE: CHEST - PA ` LATERAL ACCXXXXXX REPORT: CLINICAL HISTORY: CHEST PAIN. CHF. AP ERECT AND LATERAL VIEWS OF THE CHEST WERE OBTAINED. THERE ARE NO PRIOR STUDIES AVAILABLE FOR COMPARISON. THE TRACHEA IS NORMAL IN POSITION. HEART IS MODERATELY ENLARGED. HEMIDIAPHRAGMS ARE SMOOTH. THERE ARE SMALL BILATERAL PLEURAL EFFUSIONS. THERE IS ENGORGEMENT OF THE PULMONARY VASCULARITY. IMPRESSION: 1. CONGESTIVE HEART FAILURE WITH CARDIOMEGALY AND SMALL BILATERAL PLEURAL EFFUSIONS. 2. INCREASING OPACITY AT THE LEFT LUNG BASE LIKELY REPRESENTING PASSIVE ATELECTASIS.
…. …………………. ……………. ……….
FAMILY HISTORY: IS NONCONTRIBUTORY IN A PATIENT OF THIS AGE GROUP. SOCIAL HISTORY: SHE IS DIVORCED. THE PATIENT CURRENTLY LIVES AT BERKS HEIM. SHE IS ACCOMPANIED TODAY ON THIS VISIT BY HER DAUGHTER. SHE DOES NOT SMOKE OR ABUSE ALCOHOLIC BEVERAGES. PHYSICAL EXAMINATION: GENERAL: THIS IS AN ELDERLY, VERY PALE-APPEARING FEMALE WHO IS SITTING IN A WHEELCHAIR AND WAS EXAMINED IN HER WHEELCHAIR. HEENT: SHE IS WEARING GLASSES. SITTING UPRIGHT IN A WHEELCHAIR. NECK: NECK VEINS WERE NONDISTENDED. I COULD NOT HEAR A LOUD CAROTID BRUIT. LUNGS: HAVE DIMINISHED BREATH SOUNDS AT THE BASES WITH NO LOUD WHEEZES, RALES OR RHONCHI. HEART: HEART TONES WERE BRADYCARDIC, REGULAR AND RATHER DISTANT WITH A SYSTOLIC MURMUR HEARD AT THE LEFT LOWER STERNAL BORDER. I COULD NOT HEAR A LOUD GALLOP RHYTHM WITH HER SITTING UPRIGHT OR A LOUD DIASTOLIC MURMUR. ABDOMEN: WAS SOFT AND NONTENDER. EXTREMITIES: ARE REMARKABLE FOR THE FACT THAT SHE HAS A BRACE ON HER LEFT LOWER EXTREMITY. THERE DID NOT APPEAR TO BE SIGNIFICANT PERIPHERAL EDEMA. NEUROLOGIC: SHE CLEARLY HAD RESIDUAL HEMIPARESIS FROM HER PREVIOUS STROKE, BUT SHE WAS AWAKE AND ALERT AND ANSWERING QUESTIONS APPROPRIATELY.
……………… ……………….. ……….. ………… ……… …….. …….
21SIEMENS /38
Diagnostic coding
22
Machine Learning / Data Mining
Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information
The ultimate goal of machine learning is the creation and understanding of machine intelligence
The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and making decisions from a set of data.
23
Traditional Topics in Data Mining /AI
Fuzzy set and fuzzy logic– Fuzzy if-then rules
Evolutionary computation– Genetic algorithms– Evolutionary strategies
Artificial neural networks– Back propagation network (supervised
learning)– Self-organization network (unsupervised
learning, will not be covered)
24
Next Class
Continue with data mining topics Review of some basics of linear algebra and
probability
25
Last Class
Described the syllabus of this course Talked about HuskyCT website (Illustration) Briefly introduce 3 medical informatics topics
– Medical images: cardiac echo view recognition
– Numerical: Trauma patient care– Free text: ICD-9 diagnostic coding
Introduce a little bit about definition of data mining, machine learning, statistical learning theory.
26
Lack theoretical analysis about the behavior of the algorithms
Traditional Techniquesmay be unsuitable due to – Enormity of data– High dimensionality
of data– Heterogeneous,
distributed nature of data
Challenges in traditional techniques
Machine Learning/Pattern
Recognition
Statistics/AI
Soft Computing
27
Recent Topics in Data Mining
Supervised learning such as classification and regression– Support vector machines
– Regularized least squares
– Fisher discriminant analysis (LDA)
– Graphical models (Bayesian nets)
– others
Draw from Machine Learning domains
28
Recent Topics in Data Mining
Unsupervised learning such as clustering– K-means – Gaussian mixture models– Hierarchical clustering– Graph based clustering (spectral clustering)
Dimension reduction– Feature selection– Compact feature space into low-dimensional
space (principal component analysis)
29
Statistical Behavior
Many perspectives to analyze how the algorithm handles uncertainty
Simple examples:– Consistency analysis– Learning bounds (upper bound on test error of
the constructed model or solution) “Statistical” not “deterministic”
– With probability p, the upper bound holds
P( > p) <= Upper_bound
30
Tasks may be in Data Mining
Prediction tasks (supervised problem)– Use some variables to predict unknown or
future values of other variables.
Description tasks (unsupervised problem)– Find human-interpretable patterns that
describe the data.
From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
31
Problems in Data Mining
Inference Classification [Predictive]
Regression [Predictive]
Clustering [Descriptive]
Deviation Detection [Predictive]
32
Classification: Definition
Given a collection of examples (training set )– Each example contains a set of attributes, one of
the attributes is the class. Find a model for class attribute as a function
of the values of other attributes. Goal: previously unseen examples should be
assigned a class as accurately as possible.– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
33
Classification Example
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
categoric
al
categoric
al
continuous
class
Refund MaritalStatus
TaxableIncome Cheat
No Single 75K ?
Yes Married 50K ?
No Married 150K ?
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?10
TestSet
Training Set
ModelLearn
Classifier
34
Classification: Application 1
High Risky Patient Detection– Goal: Predict if a patient will suffer major complication
after a surgery procedure– Approach:
Use patients vital signs before and after surgical operation.– Heart Rate, Respiratory Rate, etc.
Monitor patients by expert medical professionals to label which patient has complication, which has not.
Learn a model for the class of the after-surgery risk. Use this model to detect potential high-risk patients for a
particular surgical procedure
35
Classification: Application 2
Face recognition
– Goal: Predict the identity of a face image
– Approach: Align all images to derive the features Model the class (identity) based on these features
36
Classification: Application 3
Cancer Detection
– Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data
– Approach: Use expression levels of all
genes as the features Label each example as cancer
or normal Learn a model for the class of
all samples
37
Classification: Application 4
Alzheimer's Disease Detection
– Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET
– Approach: Extract features from
neuroimages Label each example as AD or
normal Learn a model for the class of
all samples
Reduced gray matter volume (colored areas) detected by MRI voxel-basedmorphometry in AD patients compared to normal healthy controls.
38
Regression
Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.
Greatly studied in statistics, neural network fields. Examples:
– Predicting sales amounts of new product based on advertising expenditure.
– Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
– Time series prediction of stock market indices.
39
Classification algorithms
K-Nearest-Neighbor classifiers Naïve Bayes classifier Neural Networks Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Decision Tree Logistic Regression Graphical models
40
Clustering Definition
Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that– Data points in one cluster are more similar to
one another.– Data points in separate clusters are less
similar to one another. Similarity Measures:
– Euclidean Distance if attributes are continuous.
– Other Problem-specific Measures
41
Illustrating Clustering
Euclidean Distance Based Clustering in 3-D space.
Intracluster distancesare minimized
Intracluster distancesare minimized
Intercluster distancesare maximized
Intercluster distancesare maximized
42
Clustering: Application 1
High Risky Patient Detection– Goal: Predict if a patient will suffer major complication
after a surgery procedure– Approach:
Use patients vital signs before and after surgical operation.– Heart Rate, Respiratory Rate, etc.
Find patients whose symptoms are dissimilar from most of other patients.
43
Clustering: Application 2
Document Clustering:– Goal: To find groups of documents that are
similar to each other based on the important terms appearing in them.
– Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.
– Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.
44
Illustrating Document Clustering
Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in
these documents (after some word filtering).
Category TotalArticles
CorrectlyPlaced
Financial 555 364
Foreign 341 260
National 273 36
Metro 943 746
Sports 738 573
Entertainment 354 278
45
Clustering algorithms
K-Means Hierarchical clustering Graph based clustering (Spectral
clustering) Semi-supervised clustering Others
46
Basics of probability
An experiment (random variable) is a well-defined process with observable outcomes.
The set or collection of all outcomes of an experiment is called the sample space, S.
An event E is any subset of outcomes from S.
Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.
47
Probability Theory
Apples and Oranges
Assume P(Y=r) = 40%, P(Y=b) = 60% (prior)P(X=a|Y=r) = 2/8 = 25%P(X=o|Y=r) = 6/8 = 75%
P(X=a|Y=b) = 3/4 = 75%P(X=o|Y=b) = 1/4 = 25%
X: identity of the fruitY: identity of the box
Marginal P(X=a) = 11/20, P(X=o) = 9/20Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3
48
Probability Theory
Marginal Probability
Conditional Probability
Joint Probability
49
Probability Theory
Sum Rule
• Product Rule
The marginal prob of X equals the sum of the joint prob of x and y with respect to y
The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X
50
Illustration
Y=1
Y=2
p(X)
p(Y)
p(X|Y=1)
p(X,Y)
51
The Rules of Probability
Sum Rule
Product Rule
Bayes’ Rule
posterior likelihood × prior
= p(X|Y)p(Y)
52
Mean and Variance
The mean of a random variable X is the average value X takes.
The variance of X is a measure of how dispersed the values that X takes are.
The standard deviation is simply the square root of the variance.
53
Simple Example
X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2
Mean – 0.8 X 1 + 0.2 X 2 = 1.2
Variance – 0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2)
X (2-1.2)
54
References
SC_prob_basics1.pdf (necessary) SC_prob_basic2.pdf
Loaded to HuskyCT
55
Basics of Linear Algebra
56
Matrix Multiplication
The product of two matrices
Special case: vector-vector product, matrix-vector product
CA B
57
Matrix Multiplication
58
Rules of Matrix Multiplication
CAB
59
Orthogonal Matrix
. ifonly and if orthormal, are )( of columns The
U
)matrixidentity theis(.ifonlyandif ,orthogonalis1-
IV VnmV
U
IIUUU
Tnm
T
mmTmm
11
1
...
60
Square Matrix – EigenValue, EigenVector
reigenvecto theisx
eigenvalue theis
.ifonlyandif,ofpaireigenanis),(
xAxAx
where
61
Symmetric Matrix – EigenValue EigenVector
ni
xAxxA
ni
xAxxA
i
nTnn
i
nTnn
,,1 ,0
. nonzeroany for ,0 if definite, positive and symmetric is
,,1 ,0
.any for ,0 if definite,-semi positive and symmetric is
.
TAAA if symmetric, is
eigen-decomposition of A
62
Matrix Norms and Trace
columns. lorthonorma has if,
). trace( ) trace(), trace( )trace(
.by size ofmatrix square afor ,)trace(
.:norm-1
.:norm-F
. of alueeigenlargest theofroot square the :norm-2
:normMatrix
2
1
,1
,
2
F
2
QAQA
BAABAAAAA
mmAAA
AA
AA
AAvA
FF
TT
F
m
iii
jiij
jiij
T
Frobenius norm
63
Singular Value Decomposition
. of rseigenvecto theforms:
. of rseigenvecto theforms:
.min and with diagonal is),,(and ,orthogonal are
and,where, :(SVD)ion Decomposit ValueSingular
11
AAVVVAA
AAUUUAA
(m,n)rdiag
VUAVUA
TTTT
TTTT
rr
nnmmnmT
orthogonalorthogonal
diagonal
64
References
SC_linearAlg_basics.pdf (necessary) SVD_basics.pdf
loaded to HuskyCT
65
Summary
This is the end of the FIRST chapter of this course
Next Class
Cluster analysis– General topics– K-means
Slides after this one are backup slides, you can also check them to learn more
66
Neural Networks
Motivated by biological brain neuron model introduced by McCulloch and Pitts in 1943
A neural network consists of Nodes (mimic neurons) Links between nodes (pass message around, represent causal
relationship) All parts of NN are adaptive (modifiable parameters) Learning rules specify these parameters to finalize the NN
soma
Dendrite
Nucleus
Axon
Myelin Sheath
Node of Ranvier
Schwann cell
Axon terminal
67
Illustration of NN
x1
x2
y
w11
w12
Activation function
68
Many Types of NN
Adaptive NN Single-layer NN (perceptrons) Multi-layer NN Self-organizing NN Different activation functions
Types of problems:– Supervised learning– Unsupervised learning
69
Classification: Addiitonal Application
Sky Survey Cataloging
– Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory).
– 3000 images with 23,040 x 23,040 pixels per image.
– Approach: Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars,
some of the farthest objects that are difficult to find!
From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
70
Classifying Galaxies
Early
Intermediate
Late
Data Size: • 72 million stars, 20 million galaxies• Object Catalog: 9 GB• Image Database: 150 GB
Class: • Stages of
Formation
Attributes:• Image features, • Characteristics of
light waves received, etc.
Courtesy: http://aps.umn.edu
71
Challenges of Data Mining
Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation
72
Application of Prob Rules
p(X=a) = p(X=a,Y=r) + p(X=a,Y=b)= p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20=0.25*0.4 + 0.75*0.6 = 11/20
p(Y=r|X=o) = p(Y=r,X=o)/p(X=o)= p(X=o|Y=r)p(Y=r)/p(X=o)= 0.75*0.4 / (9/20) = 2/3
Assume P(Y=r) = 40%, P(Y=b) = 60%P(X=a|Y=r) = 2/8 = 25%P(X=o|Y=r) = 6/8 = 75%
P(X=a|Y=b) = 3/4 = 75%P(X=o|Y=b) = 1/4 = 25%
73
The Gaussian Distribution
74
Gaussian Mean and Variance
75
The Multivariate Gaussian
x
y
top related