one talk machine learning

© 2005, it - instituto de telecomunicações. Todos os direitos reservados.

André Lourenço

Instituto Superior de Engenharia de Lisboa,

Instituto de Telecomunicações,

Instituto Superior Técnico, Lisbon, Portugal

Machine Learning

Learning with Data

10/11/2011 - ONE Talks

2

Outline

• Introduction

• Examples

• What does it mean to learn?

• Supervised and Unsupervised Learning

• Types of Learning

• Classification Problem

• Text Mining Example

• Conclusions (and further reading)

3

Introduction

4

What is Machine Learning?

• A branch of artificial

intelligence (AI)

• Arthur Samuel (1959)

Field of study that gives

computers the ability to

learn without being explicitly

programmed

09-11-2011

From: Andrew NG – Standford Machine Learning Classes

http://www.youtube.com/watch?v=UzxYlbK2c7E

5


• Tom Mitchell (1998) Well-posed Learning

Problem:

A computer program is said to learn from

experience E with respect to some class of

tasks T and performance measure P, if its

performance at tasks in T, as measured by P,

improves with experience E.

• Mark Dredze

Teaching a computer about the world

09-11-2011

6


• Goal:

Design and development of algorithms that allow

computers to evolve behaviors based on

empirical data, such as from sensor data or

databases

• How to apply machine Learning?

• Observe the world

• Develop models that match observations

• Teach computer to learn these models

• Computer applies learned model to the world

09-11-2011

7

Example 1:

Prediction of House Price

09-11-2011

From: Andrew NG – Standford Machine Learning Classes

http://www.youtube.com/watch?v=UzxYlbK2c7E

8

Example 2:

Learning to automatically classify text documents

09-11-2011

From: http://www.xcellerateit.com/

9

Example 3:

Face Detection and Tracking

09-11-2011

http://www.micc.unifi.it/projects/optimal-

face-detection-and-tracking/

10

Example 4:

Social Network Mining

09-11-2011

Hidden Information ?

Group & Network

Friendship

Users’

Profile

U3

U1

U2 U4

U5

Group

Network

From: Exploit of Online Social Networks with Community-Based

Graph Semi-Supervised Learning, Mingzhen Mo and Irwin King

ICONIP 2010, Sydney, Australia

11

Example 5:

Biometric Systems

09-11-2011

1. Physical

2. Behavioral

12

WHAT DOES IT MEAN TO

LEARN?

09-11-2011

13

What does it mean to learn?

• Learn patterns in data

09-11-2011

Decision

System

z ẋ

z : observed signal

ẋ Estimated output

14

Unsupervised Learning

• Look for patterns in data

• No training Data (no examples of output)

• Pro:

• No labeling of examples for output

• Con:

• Cannot demonstrate specific types of output

• Applications:

• Data mining

• Finds interesting patterns in data

09-11-2011

From: Mark Dredze

Machine Learning - Finding Patterns in the World

15

Supervised Learning

• Learn patterns to simulate given output

• Pro:

• Can learn complex patterns

• Good performance

• Con:

• Requires many examples of output for examples

• Applications:

• Classification

• Sorts data into predefined groups

09-11-2011

From: Mark Dredze


16

Types of Learning: Output

• Classification

• Binary, multi‐class, multi‐label, hierarchical, etc.

• Classify email as spam

• Loss: accuracy

• Ranking

• Order examples by preference

• Rank results of web search

• Loss: Swapped pairs

• Regression

• Real‐valued output

• Predict the price of tomorrow’s stock price

• Loss: Squared loss

• Structured prediction

• Sequences, trees, segmentation

• Find faces in an image

• Loss: Precision/Recall of faces

09-11-2011

From: Mark Dredze


17

Classification Problem

• Classical Architecture

09-11-2011

Feature

Extraction

z ẋ

z : observed signal

y : feature vector (pattern) y S

ẋ Estimated output (class) ẋ {1,2,…,c}

Classification y

18


• Example with 1 feature

• Problem: classify people in non-obese or obese by

observation of its weight (only 1 feature)

• Is it possible to classify without without making any

mistakes?

18

19


09-11-2011

Feature

Extraction

z ẋ = non-obese

or obese

z : observed signal

y : feature vector (pattern) y S

ẋ Estimated output (class) ẋ {1: non-obese, 2: obese}

Classification y = {weight,

Height}

• Example with 2 features

20




observation of its weight and height

• Now the decision appears more simple!

20

21




observation of its weight and height

• Regiões de decisão: R1 : non-obese; R2 : obese

21

22


• Decision Regions

• Goal of the classifier: define a partition of the feature space with

c disjoint regions, called decision regions: : R1, R2, …, Rc

22

23

TEXT MINING EXAMPLE

09-11-2011

24

Text Mining Process

09-11-2011

Adapted from: Introduction to Text Mining,

Yair Even-Zohar, University of Illinois

25

Text Mining Process

• Text preprocessing • Syntactic/Semantic text

analysis

• Features Generation • Bag of words

• Features Selection • Simple counting

• Statistics

• Text/Data Mining • Classification- Supervised

learning

• Clustering- Unsupervised learning

• Analyzing results

09-11-2011

26

Syntactic / Semantic text analysis

• Part Of Speech (pos) tagging

• Find the corresponding pos for each word

e.g., John (noun) gave (verb) the (det) ball (noun)

• Word sense disambiguation

• Context based or proximity based

• Parsing

• Generates a parse tree (graph) for each sentence

• Each sentence is a stand alone graph

09-11-2011

27

Feature Generation: Bag of words

• Text document is represented by the words it

contains (and their occurrences)

• e.g., “Lord of the rings” {“the”, “Lord”, “rings”, “of”}

• Highly efficient

• Makes learning far simpler and easier

• Order of words is not that important for certain applications

• Stemming: identifies a word by its root

• e.g., flying, flew fly

• Reduce dimensionality

• Stop words: The most common words are unlikely

to help text mining

• e.g., “the”, “a”, “an”, “you” …

09-11-2011

28

Example

09-11-2011

Hi,

Here is your weekly update (that unfortunately hasn't gone out in about a month). Not much action here right now.

1) Due to the unwavering insistence of a member of the group, the ncsa.d2k.modules.core.datatype package is now completely independent of the d2k application.

2) Transformations are now handled differently in Tables. Previously, transformations were done using a TransformationModule. That module could then be added to a list that an ExampleTable kept. Now, there is an interface called Transformation and a sub-interface called ReversibleTransformation.

hi, weekly update (that unfortunately gone out month). much action here right now. 1) due unwavering insistence member group, ncsa.d2k.modules.core.datatype package now completely independent d2k application. 2) transformations now handled differently tables. previously, transformations done using transformationmodule. module added list exampletable kept. now, interface called transformation sub-interface called reversibletransformation.

hi week update unfortunate go out month much action here right now 1 due unwaver insistence member group ncsa d2k modules core datatype package now complete independence d2k application 2 transformation now handle different table previous transformation do use transformationmodule module add list exampletable keep now interface call transformation sub-interface call reversibletransformation

29

Feature Generation: Weighting

• Term Frequency

term ti, document dj

• Inverse Document Frequency

• TF-IDF

09-11-2011

Lorem ipsum dolor sit

amet, consectetuer

adipiscing elit. Praesent

et quam sit amet diam

porttitor iaculis.

Vestibulum ante ipsum

primis in faucibus orci

luctus et ultrices posuere

cubilia Curae;

Bag of Words

Lorem 1

dolor 1

Praesent 1

iaculis 1

Vestibulum 1

ipsum 2

consectetuer 2

30

Feature Generation: Vector Space Model

09-11-2011

Documents as vectors

31

Feature Selection

• Reduce dimensionality

• Learners have difficulty addressing tasks with high

dimensionality

• Irrelevant features

• Not all features help!

•e.g., the existence of a noun in a news

article is unlikely to help classify it as

“politics” or “sport”

• Stop Words Removal

09-11-2011

32

Example

09-11-2011

hi week update unfortunate go out month much action here right now 1 due unwaver insistence member group ncsa d2k modules do

core datatype package complete independence application 2 transformation handle different table previous use transformationmodule add list exampletable keep interface call sub-interface reversibletransformation

hi week update unfortunate go out month much action here right now due insistence member group ncsa d2k modules

do core datatype package complete independence application transformation handle different table previous use add list keep interface call sub-interface

hi week update unfortunate month action right due insistence member group ncsa d2k modules core

datatype package complete independence application transformation handle different table previous add list interface call sub-interface

33

Document Similarity

• Dot Product – cosine

similarity

09-11-2011

34

Text Mining: Classification definition

• Given: a collection of labeled records

(training set)

• Each record contains a set of features (attributes), and

the true class (label)

• Find: a model for the class as a function

of the values of the features

• Goal: previously unseen records should be

assigned a class as accurately as possible

• A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training

and test sets, with training set used to build the model

and test set used to validate it

09-11-2011

35

Text Mining: Clustering definition

• Given: a set of documents and a similarity

measure among documents

• Find: clusters such that:

• Documents in one cluster are more similar to one another

• Documents in separate clusters are less similar to one another

• Goal:

• Finding a correct set of documents

09-11-2011

36

Supervised vs. Unsupervised Learning

• Supervised learning (classification)

• Supervision: The training data (observations,

measurements, etc.) are accompanied by labels

indicating the class of the observations

• New data is classified based on the training set

• Unsupervised learning (clustering)

• The class labels of training data is unknown

• Given a set of measurements, observations, etc. with the

aim of establishing the existence of classes or clusters in

the data

09-11-2011

37

CONCLUDING REMARKS

09-11-2011

38

Readings

• Survey Books in Machine Learning

• The Elements of Statistical Learning

• Hastie, Tibshirani, Friedman

• Pattern Recognition and Machine Learning

• Bishop

• Machine Learning

• Mitchell

• Questions?

09-11-2011

39

ACKNOWLEDGEMENTS

• ISEL – DEETC

• Final year and MSc supervised students (Tony Tam, ...)

• Students of Digital Signal Processing

• Artur Ferreira

• Instituto Telecomunicações (IT)

David Coutinho, Hugo Silva, Ana Fred, Mário Figueiredo

• Fundação para a Ciência e Tecnologia (FCT)

09-11-2011

40

Thank you for the attention!

André Ribeiro Lourenço

Mail to: [email protected]

[email protected]

www.it.pt

one talk machine learning

Technology

classification problem

feature problem

mark dredze machine

obese byobservation

output classification

obese z

nonobese r2

classification sorts