machine learning applied in product classification

18
Machine Learning Applied in Product Classification Jianfu Chen Computer Science Department Stony Brook University

Upload: gad

Post on 07-Jan-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning Applied in Product Classification. Jianfu Chen Computer Science Department Stony Brook University. Machine learning learns an idealized model of the real world. 1 + 1 = 2. ?. Prod1 -> class1 Prod2 -> class2 ... f ( x ) -> y - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning Applied in Product Classification

Machine Learning Applied in Product Classification

Jianfu ChenComputer Science Department

Stony Brook University

Page 2: Machine Learning Applied in Product Classification

Machine learning learns an idealized model of the real world.

+¿ ¿

+¿ ¿

1 + 1 = 2

+¿ ¿ ?

Page 3: Machine Learning Applied in Product Classification

Prod1 -> class1Prod2 -> class2

...

f(x) -> y Prod3 -> ?

X: Kindle Fire HD 8.9" 4G LTE Wireless 0 ... 1 1 ... 1 ... 1 ... 0 ...

Page 4: Machine Learning Applied in Product Classification

Compoenents of the magic box f(x)

Representat

ion

• Give a score to each class• s(y; x) =

Inference

• Predict the class with highest score

Learning

• Estimate the parameters from data

Page 5: Machine Learning Applied in Product Classification

Representation

Linear Model

• s(y;x)=

Probabilistic Model

• P(x,y)• Naive Bayes

• P(y|x)• Logistic

Regression

Algorithmic Model

• Decision Tree• Neural

Networks

Given an example, a model gives a score to each class.

Page 6: Machine Learning Applied in Product Classification

Linear Model

• a linear comibination of the feature values. • a hyperplane.• Use one weight vector to score each class.

𝑤1

𝑤2𝑤3

Page 7: Machine Learning Applied in Product Classification

Example

• Suppose we have 3 classes, 2 features• weight vectors

Page 8: Machine Learning Applied in Product Classification

Probabilistic model

• Gives a probability to class y given example x:

• Two ways to do this:– Generative model: P(x,y) (e.g., Naive Bayes)

– discriminative model: P(y|x) (e.g., Logistic Regression)

Page 9: Machine Learning Applied in Product Classification

Compoenents of the magic box f(x)

Representat

ion

• Give a score to each class• s(y; x) =

Inference

• Predict the class with highest score

Learning

• Estimate the parameters from data

Page 10: Machine Learning Applied in Product Classification

Learning

• Parameter estimation ()– ’s in a linear model– parameters for a probabilistic model

• Learning is usually formulated as an optimization problem.

Page 11: Machine Learning Applied in Product Classification

Define an optimization objective- average misclassification cost

• The misclassification cost of a single example x from class y into class y’:

– formally called loss function• The average misclassification cost on the

training set:

– formally called empirical risk

Page 12: Machine Learning Applied in Product Classification

Define misclassification cost

• 0-1 loss

average 0-1 loss is the error rate = 1 – accuracy:

• revenue loss

Page 13: Machine Learning Applied in Product Classification

Do the optimization- minimizes a convex upper bound of

the average misclassification cost.

• Directly minimizing average misclassificaiton cost is intractable, since the objective is non-convex.

•minimize a convex upper bound instead.

Page 14: Machine Learning Applied in Product Classification

A taste of SVM

• minimizes a convex upper bound of 0-1 loss

where C is a hyper parameter, regularization parameter.

Page 15: Machine Learning Applied in Product Classification

Machine learning in practice

feature extraction { (x, y) }

select a model/classifier

Setup experimenttraining:development:test4 : 2 : 4

SVM

call a package to do experiments

• LIBLINEARhttp://www.csie.ntu.edu.tw/~cjlin/liblinear/• find best C in developement set• test final performance on test set

Page 16: Machine Learning Applied in Product Classification

Cost-sensitive learning

• Standard classifier learning optimizes error rate by default, assuming all misclassification leads to uniform cost

• In product taxonomy classification

keyboardmousetruck car

IPhone5

Nokia 3720 Classic

Page 17: Machine Learning Applied in Product Classification

Minimize average revenue loss

where is the potential annual revenue of product x if it is correctly classified;

is the loss ratio of the revenue by misclassifying a product from class y to class y’.

Page 18: Machine Learning Applied in Product Classification

Conclusion

• Machine learning learns an idealized model of the real world.

• The model can be applied to predict unseen data.

• Classifier learning minimizes average misclassification cost.

• It is important to define an appropriate misclassification cost.