machine learning tutorial part 2
TRANSCRIPT
-
8/3/2019 Machine Learning Tutorial Part 2
1/29
Machine Learning for... OKKAMoids Part IIDelving Deeper: Models, Parameter Estimation, and ML in
Practice
George Giannakopoulos
April 13, 2010
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
2/29
-
8/3/2019 Machine Learning Tutorial Part 2
3/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
Connection to Part IDefinitions
In the previous episode...
Learning from Experience
Various tasks
Differentiation based on input (classification, clustering, ...)
Differentiation based on inductive bias (strategy)
Several ways to evaluate performance
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
4/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
Connection to Part IDefinitions
Purpose
Part II: Delving Deeper
We will touch the following subjects:
How can I classify sequences?
Aspects of modeling and parameter estimation
Good Practices for Using Machine Learning Techniques
Matching Problems to AlgorithmsAvailable Tools
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
I d i
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
5/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
Connection to Part IDefinitions
Pattern Recognition
Pattern Recognition is the scientific discipline whose goal is theclassification of objects into a number of categories orclasses. [Theodoridis and Koutroumbas, 2003]
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
I t d ti
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
6/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
Connection to Part IDefinitions
Sequence
A sequence is an ordered list of terms:
S is a set , f :I
S
Strings vs. time-series
abcdacde: A string or symbol sequence
(0.2, 0.4, 0.3, 0.1, 2.1): A time series
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
7/29
Introduction
-
8/3/2019 Machine Learning Tutorial Part 2
8/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
OverviewTypes of Modeling
Classifying sequences The models
Deterministic Grammars and
Automata [Cicchello and Kremer, 2003]Probabilistic State Machines and BayesianNetworks [Heckerman, 2008]
Constraint Satisfaction (e.g. N-gram
Graphs [Giannakopoulos, 2009])
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
Introduction
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
9/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
OverviewTypes of Modeling
Classifying sequences What is this all about?
Observations, i.e. what we see, the output
Optionally (Hidden) States (also termed labels), i.e. whatcauses the observations
Parameters, i.e. the details of the cause that determines theoutput or of the model that explains it
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
Introduction
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
10/29
IntroductionClassifying sequences
Parameter EstimationIn Practice
Closing
OverviewTypes of Modeling
An example Hidden Markov Model
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
Introduction
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
11/29
Classifying sequencesParameter Estimation
In PracticeClosing
OverviewTypes of Modeling
Modeling Discriminative vs. Generative Models (1)
Discriminative model: does not take into account previousobservations
Generative model: takes into account previous observations
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
Introduction
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
12/29
Classifying sequencesParameter Estimation
In PracticeClosing
OverviewTypes of Modeling
Modeling Discriminative vs. Generative Models (2)
Discriminative model: No assumption for the observations(e.g. Conditional Random Fields: see [Wallach, 2004] for an
introduction)Generative models: Some assumptions for the observations(e.g. Hidden Markov Models: see [Rabiner, 1989] for aintroduction tutorial)
Which is better? See: [Long and Servedio, 2006],[Jebara and Meila, 2006]1
1
See also http://tinyurl.com/y7yat5q for a related discussion.George Giannakopoulos Machine Learning for... OKKAMoids Part I I
Introduction
http://tinyurl.com/y7yat5qhttp://tinyurl.com/y7yat5qhttp://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
13/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Example of Model with Parameters
SPRINKLERSPRINKLERSPRINKLER RAIN
GRASS WET
T F
SPRINKLER
0.4 0.6
T F
RAIN
0.2 0.8
SPRINKLER F
GRASS WET
0.0 1.0
TRAIN
FF
0.8 0.2TF
0.9 0.1FT
0.99 0.01TT
RAIN
F
0.01 0.99T
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionC f
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
14/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
The Question
What is the best model to describe a set ofobservations/instances?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionCl if i Th B i
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
15/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Searching for a Model
Assumptions (Independence, Underlying ProbabilityDistributions, etc.)
A priori knowledge (Previous Studies, Expert Knowledge, etc.)
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionCl ssif i s s Th B si s
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
16/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Parameteric vs. Non-parametric Approaches
Parametric approaches: Stable set of unknown parameters(e.g. Power-law parameters)
Non-parametric approaches: Determined set of unknownparameters, based on the learning (e.g. histogram)
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences The Basics
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
17/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Refining a Model (1)
Example
For a gaussian distribution find the best parameters (, ) thatdesribe the following values:0.7168090 0.6515225 0.6213850 -0.6626706 -1.1918936 0.7711588-3.1388009 0.2561228 1.1569174 0.6771980
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences The Basics
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
18/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Refining a Model (2)
Best parametersMean: 0.01422517, St. Dev: 1.311977
Usually, we have to search in the parameter space.
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences The Basics
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
19/29
Classifying sequencesParameter Estimation
In PracticeClosing
The BasicsModelsModel Optimization
Searching for Optimal Parameters
Small search space Brute force methods
High-speed approximation Greedy techniques
No local maxima Gradient descent
Small plateau Simulated annealing
Little known Evolutionary (genetic) algorithms
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
T l
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
20/29
Classifying sequencesParameter Estimation
In PracticeClosing
ToolsFrom Problem to Solution
ML tools
WEKA: Many algorithms
HMM: JaHMM, Hidden Markov Model Toolbox
CRF: CRF for JavaSVM: LibSVM, SVMLite
Time Series: Gnu Regression, Econometrics and Time-seriesLibrary (Gretl), Rapid-I (former YALE)
Constraint-based: JINSECT
2
MLOSS3
2See http://sourceforge.net/projects/jinsect/.3
See http://mloss.org/software/ for many tools.George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Tools
http://sourceforge.net/projects/jinsect/http://sourceforge.net/projects/jinsect/http://mloss.org/software/http://mloss.org/software/http://sourceforge.net/projects/jinsect/http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
21/29
y g qParameter Estimation
In PracticeClosing
ToolsFrom Problem to Solution
Questions
What data do I have/need?What is the data type: sequence or
What do I want to learn?
What do I know beforehand?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Tools
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
22/29
Parameter EstimationIn Practice
Closing
ToolsFrom Problem to Solution
Considerations
What are the features?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Tools
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
23/29
Parameter EstimationIn Practice
Closing
ToolsFrom Problem to Solution
Considerations
What are the features?
What do they mean?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
P E i iTools
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
24/29
Parameter EstimationIn Practice
Closing
ToolsFrom Problem to Solution
Considerations
What are the features?
What do they mean?Is there an obvious connection to the class?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
P t E ti tiTools
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
25/29
Parameter EstimationIn Practice
Closing
From Problem to Solution
Considerations
What are the features?
What do they mean?Is there an obvious connection to the class?
Feature vector: what does every dimension represent insimple words?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Parameter EstimationTools
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
26/29
Parameter EstimationIn Practice
Closing
From Problem to Solution
Considerations
What are the features?
What do they mean?Is there an obvious connection to the class?
Feature vector: what does every dimension represent insimple words?
Can I describe in a sentence what my instance is?
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Parameter EstimationWhat did we talk about today?
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
27/29
Parameter EstimationIn Practice
Closing
Is that all?
Recapitulation
Sequences
Models
Parameters
Tools
Practices
George Giannakopoulos Machine Learning for... OKKAMoids Part I I
IntroductionClassifying sequences
Parameter EstimationWhat did we talk about today?
http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
28/29
Parameter EstimationIn Practice
Closing
Is that all?
Yes, we are done! If you need more, see the references in
Section 6
Thank you!Please check the feedback form4 to help me improve.
4
http://tinyurl.com/ycommj3George Giannakopoulos Machine Learning for... OKKAMoids Part I I
References
http://tinyurl.com/ycommj3http://tinyurl.com/ycommj3http://find/http://goback/ -
8/3/2019 Machine Learning Tutorial Part 2
29/29
References
(2010).
Wikimedia commons.Cicchello, O. and Kremer, S. C. (2003).Inducing grammars from sparse data sets: a survey ofalgorithms and results.J. Mach. Learn. Res., 4:603632.
Colton, S. (March 30, 2010).Artificial intelligence course v231.
Giannakopoulos, G. (2009).Automatic summarization from multiple documents.PhD thesis, Ph. D. dissertation, Department of Informationand Communication Systems Engineering, University of theAegean, Samos, Greece, http://www. iit. demokritos.gr/ggianna/thesis. pdf.
Heckerman, D. (2008).George Giannakopoulos Machine Learning for... OKKAMoids Part I I
http://find/http://goback/