general information course id: cosc6342 machine learning time: mo/we 2:30-4p instructor: christoph...

16
General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom: SEC 201 E-mail: [email protected] Homepage: http://www2.cs.uh.edu/~ceick/

Upload: shanon-morgan

Post on 24-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

General Information

Course Id: COSC6342 Machine Learning

Time: MO/WE 2:30-4p

Instructor: Christoph F. Eick

Classroom: SEC 201

E-mail: [email protected]

Homepage: http://www2.cs.uh.edu/~ceick/

Page 2: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

2

What is Machine Learning? Machine Learning is the

• study of algorithms that• improve their performance• at some task• with experience

Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to

• Solve optimization problems• Learning, representing and evaluating models

for inference

Page 3: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Example of a Decision Tree Model

Tid Refund MaritalStatus

TaxableIncome Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes10

categoric

al

categoric

al

continuous

class

Refund

MarSt

TaxInc

YESNO

NO

NO

Yes No

Married Single, Divorced

< 80K > 80K

Splitting Attributes

Training Data Decision Tree Model

f: {yes,no}{married,single,divorced}+ {yes,no}Classification Model in General:

Page 4: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

4

Machine Learning Tasks Supervised Learning

• Classification• Prediction

Unsupervised Learning and Summarization of Data• Association Analysis • Clustering

Preprocessing Reinforcement Learning and Adaptation Activities Related to Models

• Learning parameters of models• Choosing/Comparing models• Evaluating Models (e.g. predicting their accuracy)

Page 5: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Prerequisites Background Probabilities

• Distributions, densities, marginalization… Basic statistics

• Moments, typical distributions, regression Basic knowledge of optimization techniques Algorithms

• basic data structures, complexity… Programming skills We provide some background, but the class will be

fast paced Ability to deal with “abstract mathematical

concepts”

Page 6: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Textbooks

Textbook: Ethem Alpaydin, Introduction to Machine Learning, MIT Press, Second Edition, 2010.

Mildly Recommended Textbooks:

1. Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006.

2. Tom Mitchell, Machine Learning, McGraw-Hill, 1997.

Page 7: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Grading Spring 2014

2 Exams 58-62%3 Projects and 2HW 38-41%Attendance 1%

 

NOTE: PLAGIARISM IS NOT TOLERATED.

Remark: Weights are subject to change

Page 8: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Topics Covered in 2014 (Based on Alpaydin)

Topic 1: Introduction to Machine Learning Topic 18: Reinforcement Learning Topic 2: Supervised Learning Topic 3: Bayesian Decision Theory (excluding Belief Networks) Topic 5: Parametric Model EstimationTopic 6: Dimensionality Reduction Centering on PCA Topic 7: Clustering1: Mixture Models, K-Means and EM Topic 8: Non-Parametric Methods Centering on kNN and density estimation Topic 9: Clustering2: Density-based Approaches Topic 10 Decision Trees Topic 11: Comparing Classifiers Topic 12: Combining Multiple Learners Topic 13: Linear Discrimination Centering on Support Vector MachinesTopic 14: More on Kernel Methods Topic 15: Graphical Models Centering on Belief Networks Topic 16: Success Stories of Machine LearningTopic 17: Hidden Markov ModelsTopic 19: Neural Networks Topic 20: Computational Learning Theory

Remark: Topics 17, 19, and 20 likely will be only briefly covered or skipped---due to the lack of time. For Topic 16 your input is appreciated!

Page 9: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Course Elements

Total: 26-27 classes • 18-19 lectures• 3 course projects • 2-3 classes for review and discussing course projects • 1-2 classes will be allocated for student presentations• 3 40 minutes reviews • 2 exams• Graded and ungraded paper and pencil home problems• Course Webpage: http://www2.cs.uh.edu/~ceick/ML/ML.html

Page 10: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

2014 Plan of Course Activities

1. Through March 15: Homework1; Individual Project1 (Reinforcement Learning and Adaptation: Learn how to act intelligently in an unknown/changing environment); Homework2.

2. We., March 5: Midterm Exam 3. March 16-April 5: Group Project2 (TBDL).4. April 6-April 26: Homework3, Project3 (TBDL)5. Mo., May 5, 2p: Final Exam

Page 11: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Schedule ML Spring 2013Week Topic

Jan 14 Introduction

Jan 16 Introduction / Supervised Learning

Jan 21 Bayesian Decision Theory, Parametric Approaches

Jan. 23 Multivariate Methods, Homework1

Jan. 28 Multivariate Methods, Dim. Reduction, Project1

Jan. 30 Clustering1

Feb. 5 Non-parametric Methods, Review1

… Decision Trees, Review2, Project2, Midterm Exam

… Decision Trees, Clustering2, Reinforcement Learning

… Reinforcement Learning

… Ensembles, SVM

… SVM, Project 3, Project2 SP

… Project2 SP, More on Kernels, Project3, Comparing Learners

… Review3, Graphical Models, Kaelbling Article, TEPost Analysis Project1, Review 4

Remark: Schedule is the same as in 2013, except reinforcement learning will covered after the introduction.

Green: will use other teaching material

Page 12: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Dates to Remember

March 5 (or March 17) + May 5, 2p

Exams

April 6+8?? Project2 Student Project Presentations

Jan. 20, March 10/12 No class (Spring Break)

Feb. 23 , April 3/5, April 26 Submit Project Report /Software/Deliverable

Page 13: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Exams

Will be open notes/textbook Will get a review list before the exam Exams will center (80% or more) on material that was

covered in the lectureExam scores will be immediately converted into number

grades We only have 2009, 2011 and 2013 sample exams; I

taught this course only three times recently.

Page 14: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Other UH-CS Courses with Overlapping Contents

1. COSC 6368: Artificial Intelligence Strong Overlap: Decision Trees, Bayesian Belief Networks Medium Overlap: Reinforcement Learning

COSC 6335: Data Mining Strong Overlap: Decision trees, SVM, kNN, Density-

based Clustering Medium Overlap: K-means, Decision Trees,

Preprocessing/Exploratory DA, AdaBoost COSC 6343: Pattern Classification?!?

Medium Overlap: all classification algorithms, feature selection—discusses those topics taking

a different perspective.

Page 15: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Purpose of COSC 6342

Machine Learning is the study of how to build computer systems that learn from experience. Itintersects with statistics, cognitive science, information theory, artificial intelligence, pattern recognition and probability theory, among others. The course will explain how to build systems that learn and adapt using real-world applications. Its main themes include:• Learning how to create models from examples that

classify or predict.• Learning in unknown and changing environments• Theory of machine learning• Preprocessing• Unsupervised learning and other learning paradigms 

Page 16: General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201 E-mail: ceick@aol.comceick@aol.com

Course Objectives COSC 6342

Upon completion of this course, students • will know what the goals and objectives of machine learning

are• will have a basic understanding on how to use machine

learning to build real-world systems• will have sound knowledge of popular classification and

prediction techniques, such as decision trees, support vector machines, nearest-neighbor approaches and regression.

• will learn how to build systems that explore unknown and changing environments

• will get some exposure to machine learning theory, in particular how learn models that exhibit high accuracies.

• will have some exposure to more advanced topics, such as ensemble approaches, kernel methods, unsupervised learning, feature selection and generation, density estimation.