macadamia: master's programme in machine learning and data mining

27
Macadamia: Master’s Programme in Machine Learning and Data Mining May 6, 2008 Tapani Raiko , Kai Puolamäki, Juha Karhunen, Jaakko Hollmén, Antti Honkela, Samuel Kaski, Heikki Mannila, Erkki Oja, and Olli Simula Teaching Machine Learning: Workshop on open problems and new directions. Saint-Étienne, France

Upload: tommy96

Post on 11-May-2015

2.019 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Macadamia: Master's Programme in Machine Learning and Data Mining

Macadamia: Master’s Programme in Machine Learning and Data Mining

May 6, 2008

Tapani Raiko, Kai Puolamäki, Juha Karhunen, Jaakko Hollmén, Antti Honkela, Samuel Kaski, Heikki

Mannila, Erkki Oja, and Olli Simula

Teaching Machine Learning: Workshop on open problems and new directions. Saint-Étienne, France

Page 2: Macadamia: Master's Programme in Machine Learning and Data Mining

Macadamia = Machine learning and Data miningMacadamia is a Master's programme in Machine learning and Data mining at Helsinki University of Technology, Finland.

The programme is given by the Department of Information and Computer Science known for its pioneering research and education in this field. The Master of Science degree obtained in this programme during a span of two years enables the graduates to enter the IT industry in Finland or world-wide. The degree also has seamless continuation to doctoral studies for those interested in deeper research and development in machine learning and data mining.

Page 3: Macadamia: Master's Programme in Machine Learning and Data Mining

Machine learning and Data mining

• Machine learning researchers often use probabilistic methods

• Data mining research algorithmic (and in combination with probabilistic methods!)

• Active research in both topics

• Interaction useful: interesting things happen at the intersection

Page 4: Macadamia: Master's Programme in Machine Learning and Data Mining

AB HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Information and ComputerScience

T3060, http://www.ics.tkk.fi/

Finnish: Tietojenkäsittelytieteen laitos

Constituent laboratories (pre-2008):

Laboratory of Computer and Information Science

Laboratory for Theoretical Computer Science

Department of Information and Computer Science – 1/6

Page 5: Macadamia: Master's Programme in Machine Learning and Data Mining

AB HELSINKI UNIVERSITY OF TECHNOLOGY

Resources (averages 2004–06)

Professors: 9

Other personnel: 108 py/yr

budget funding 52 py/yr

external funding 56 py/yr

Expenditures: 5.1 Me/yr

budget funding 2.7 Me/yr (incl. overhead transfers from external

funding)

external funding 2.4 Me/yr

Department of Information and Computer Science – 2/6

Page 6: Macadamia: Master's Programme in Machine Learning and Data Mining

AB HELSINKI UNIVERSITY OF TECHNOLOGY

Degrees and teaching

Numbers (averages 2004–06)

M.Sc. (Tech.): 29/yr

Dr.Sc. (Tech.): 9/yr

ocr: 6800/yr

Four majors: computer and information science, theoretical computer

science, computational and cognitive biosciences, language

technology

Three international Master’s Programmes: Bioinformatics (MBI),

Foundations of Advanced Computing (FAdCo), Machine Learning

and Data Mining (Macadamia)

Graduate schools (positions in 2006)

Helsinki GS in Computer Science and Engineering (8)

GS in Comput. Biology, Bioinformatics, and Biometry (1)

GS of Language Technology in Finland (1)

GS in Comput. Methods of Information Technology (3)Department of Information and Computer Science – 3/6

Page 7: Macadamia: Master's Programme in Machine Learning and Data Mining

AB HELSINKI UNIVERSITY OF TECHNOLOGY

Research areas

Algorithms and methods for adaptive informatics

Multimodal interfaces

Bioinformatics and neuroinformatics

Computational cognitive systems

Adaptive informatics applications

Computational logic

Combinatorial algorithms and computational complexity

Cryptographic techniques and secure protocols

Computer-aided software quality control (verification)

Department of Information and Computer Science – 4/6

Page 8: Macadamia: Master's Programme in Machine Learning and Data Mining

PeopleTeaching and supervision for Macadamia students is given by an enthusiastic and experienced group headed by world leaders in this research field. They belong to two national Centres of Excellence, the Adaptive Informatics Research Centre and the From Data to Knowledge Research Centre. The host laboratory is a partner in several Finnish graduate schools.

The professors responsible for Macadamia are:

Page 9: Macadamia: Master's Programme in Machine Learning and Data Mining

Macadamia: Master’s Programme in Machine Learning and Data Mining

Table 1. Courses given in the programme. The size of the courses are given in credit points (ECTS). The total for thetwo-year programme is 120 ECTS. Note that the Special Courses have a varying topic (5–6 topics per year) so many ofthem may be included in the curriculum.

Obligatory courses ECTS

IT-Services at TKK 2English language tests / course 3Machine Learning: Basic Principles 5Machine Learning and Neural Networks 5Machine Learning: Advanced Probabilistic Methods 5Algorithmic methods of data mining 5Information Visualization 5Research Project in Computer and Information Science 5–10Master’s thesis 30

Relevant courses ECTS

Computer Vision 5Statistical Natural Language Processing 5High-Throughput Bioinformatics 5Signal Processing in Neuroinformatics 5Image Analysis in Neuroinformatics 5Special Course in Computer and Information Science I–VI 3–7Introduction to Bayesian Modelling 5Combinatorial Models and Stochastic Algorithms 6Search problems and algorithms 4Parallel and distributed systems 4Cryptography and data security 4Computational Complexity Theory 5Finnish 1A 2Finnish 1B 2Finnish 2A 2Finnish 2B 2

Topics of Special Courses during 2006–2008 ECTS

Gaussian Processes for Machine Learning 6Popular Algorithms in Data Mining and Machine Learning 5Reinforcement Learning — Theory and Applications 6Multimedia Retrieval 5Introductory Elements of Functional Data Analysis 7Independent Component Analysis 6Information Networks 6Variable Selection for Regression 6Nonlinear Dimensionality Reduction 6Modeling and Simulating Social Web 4Decision support with data analysis 5Data analysis and environmental informatics 5

Page 10: Macadamia: Master's Programme in Machine Learning and Data Mining

Macadamia: Master’s Programme in Machine Learning and Data Mining

Table 1. Courses given in the programme. The size of the courses are given in credit points (ECTS). The total for thetwo-year programme is 120 ECTS. Note that the Special Courses have a varying topic (5–6 topics per year) so many ofthem may be included in the curriculum.

Obligatory courses ECTS

IT-Services at TKK 2English language tests / course 3Machine Learning: Basic Principles 5Machine Learning and Neural Networks 5Machine Learning: Advanced Probabilistic Methods 5Algorithmic methods of data mining 5Information Visualization 5Research Project in Computer and Information Science 5–10Master’s thesis 30

Relevant courses ECTS

Computer Vision 5Statistical Natural Language Processing 5High-Throughput Bioinformatics 5Signal Processing in Neuroinformatics 5Image Analysis in Neuroinformatics 5Special Course in Computer and Information Science I–VI 3–7Introduction to Bayesian Modelling 5Combinatorial Models and Stochastic Algorithms 6Search problems and algorithms 4Parallel and distributed systems 4Cryptography and data security 4Computational Complexity Theory 5Finnish 1A 2Finnish 1B 2Finnish 2A 2Finnish 2B 2

Topics of Special Courses during 2006–2008 ECTS

Gaussian Processes for Machine Learning 6Popular Algorithms in Data Mining and Machine Learning 5Reinforcement Learning — Theory and Applications 6Multimedia Retrieval 5Introductory Elements of Functional Data Analysis 7Independent Component Analysis 6Information Networks 6Variable Selection for Regression 6Nonlinear Dimensionality Reduction 6Modeling and Simulating Social Web 4Decision support with data analysis 5Data analysis and environmental informatics 5

Page 11: Macadamia: Master's Programme in Machine Learning and Data Mining

Macadamia: Master’s Programme in Machine Learning and Data Mining

Table 1. Courses given in the programme. The size of the courses are given in credit points (ECTS). The total for thetwo-year programme is 120 ECTS. Note that the Special Courses have a varying topic (5–6 topics per year) so many ofthem may be included in the curriculum.

Obligatory courses ECTS

IT-Services at TKK 2English language tests / course 3Machine Learning: Basic Principles 5Machine Learning and Neural Networks 5Machine Learning: Advanced Probabilistic Methods 5Algorithmic methods of data mining 5Information Visualization 5Research Project in Computer and Information Science 5–10Master’s thesis 30

Relevant courses ECTS

Computer Vision 5Statistical Natural Language Processing 5High-Throughput Bioinformatics 5Signal Processing in Neuroinformatics 5Image Analysis in Neuroinformatics 5Special Course in Computer and Information Science I–VI 3–7Introduction to Bayesian Modelling 5Combinatorial Models and Stochastic Algorithms 6Search problems and algorithms 4Parallel and distributed systems 4Cryptography and data security 4Computational Complexity Theory 5Finnish 1A 2Finnish 1B 2Finnish 2A 2Finnish 2B 2

Topics of Special Courses during 2006–2008 ECTS

Gaussian Processes for Machine Learning 6Popular Algorithms in Data Mining and Machine Learning 5Reinforcement Learning — Theory and Applications 6Multimedia Retrieval 5Introductory Elements of Functional Data Analysis 7Independent Component Analysis 6Information Networks 6Variable Selection for Regression 6Nonlinear Dimensionality Reduction 6Modeling and Simulating Social Web 4Decision support with data analysis 5Data analysis and environmental informatics 5

Page 12: Macadamia: Master's Programme in Machine Learning and Data Mining

Machine Learning Course Reform

• Three courses were completely reformed last autumn: increasing the weight of machine learning at the cost of neural computing

• All of these courses are lectured every year

AB

Course BureaucracyChapter 1: Introduction

General InformationRelation to Old CoursesContents of the Course

Relation to the Old Courses

Old course (before Autumn 2007) New courseT-61.3030 Principles of Neural Computing T-61.3050 Machine Learning: Basic PrinciplesT-61.5030 Advanced Course in Neural Computing T-61.5130 Machine Learning and Neural NetworksT-61.5040 Learning Models and Methods T-61.5140 Machine Learning: Advanced Probabilistic Methods

Table: Correspondences in degree requirements.

Old course (before Autumn 2007) New course

T-61.5040 Learning Models and MethodsT-61.3050 Machine Learning: Basic PrinciplesT-61.5140 Machine Learning: Advanced Probabilistic Methods

T-61.3030 Principles of Neural ComputingT-61.5130 Machine Learning and Neural Networks

T-61.5030 Advanced Course in Neural Computing

Table: Approximate topical correspondeces.

See http://www.cis.hut.fi/Opinnot/T-61.3050/oldcoursesKai Puolamaki T-61.3050

Page 13: Macadamia: Master's Programme in Machine Learning and Data Mining

AB

Course BureaucracyChapter 1: Introduction

T-61.3050 Machine Learning: Basic PrinciplesIntroduction

Kai Puolamaki

Laboratory of Computer and Information Science (CIS)Department of Computer Science and Engineering

Helsinki University of Technology (TKK)

Autumn 2007

Kai Puolamaki T-61.3050

Page 14: Macadamia: Master's Programme in Machine Learning and Data Mining

AB

Course BureaucracyChapter 1: Introduction

General InformationRelation to Old CoursesContents of the Course

How to Pass the Course

You will get 5 cr for passing this course.

Requirements for passing the course:Pass the exercise work. The exercise work should be submittedby 2 January 2008. More instructions will appear in a fewweeks time.Pass the examination. You can participate to the examinationafter passing the exercise work (exception: you can participateto the December examination before passing the exercise work;you’ll then pass the course if you pass the exercise work).

Optional, but useful:Lectures.Problem sessions.Reading the book and other material.

Kai Puolamaki T-61.3050

Page 15: Macadamia: Master's Programme in Machine Learning and Data Mining

AB

Course BureaucracyChapter 1: Introduction

General InformationRelation to Old CoursesContents of the Course

Literature

The course follows a subset of the book: Alpaydin, 2004.Introduction to Machine Learning. The MIT Press.

Additionally, there will also be a PDF chapter on algorithmics(complexity of problems, local minima etc.) to be distributedfrom the course web site.

The lecture slides are available for download from the courseweb site. I have also given Edita a permission to print themon request.

You might also find the material — especially the errata andslides — at the Alpaydin’s web site (see the link at the courseweb site) useful.

Kai Puolamaki T-61.3050

Page 16: Macadamia: Master's Programme in Machine Learning and Data Mining

AB

Course BureaucracyChapter 1: Introduction

General InformationRelation to Old CoursesContents of the Course

Very Preliminary Plan of the Topics

Supervised learning, Bayesian decision theory, probabilitydistributions and parametric methods, multivariate methods,clustering (mostly Alpaydin’s chapters 1–7 and appendix A)

Algorithmic issues in machine learning, such as hardness ofproblems, approximation techniques and their features (suchas local minima), time and memory complexity in dataanalysis (separate PDF chapter to be distributed from thecourse web site)

Nonparametric methods (Alpaydin 8.1–8.2), lineardiscrimination (Alpaydin 10.1–10.8), assessing and comparingclassification algorithms (Alpaydin’s chapter 14)

I’ll try to keep the Alpaydin’s ordering of topics, andemphasize principles rather than to go through all possiblealgorithms and methods.

Kai Puolamaki T-61.3050

Page 17: Macadamia: Master's Programme in Machine Learning and Data Mining

Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks

T-61.5130 Machine Learning andNeural Networks (5 cr)

General information on the courseAutumn 2007

Prof. Juha Karhunen

http://www.cis.hut.fi/Opinnot/T-61.5130/

Helsinki University of Technology, Espoo, Finland 1

Page 18: Macadamia: Master's Programme in Machine Learning and Data Mining

Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks

• Details are still open.

• To pass the course, one must perform acceptably the computer

assignment allocated to him or her.

• And of course pass the examination, too.

Course materials

• All the course materials will be in English.

• There is no satisfactory single book suitable for this course.

• However, a large portion of the course is based on the book:

• F. Ham and I. Kostanic, Principles of Neurocomputing for Science

and Engineering, McGraw-Hill 2001.

• This book will be complemented by some material from the book S.

Haykin, “Neural Networks: A Comprehensive Foundation”, 2nd ed.,

Prentice-Hall, 1998.

Helsinki University of Technology, Espoo, Finland 8

Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks

• That previously used book is too extensive for our course.

• Furthermore, independent component analysis is covered from a

separate review article.

• Ham’s and Kostanic’s book is quite expensive (some 160 USD).

• And we shall cover only parts of the Chapters 1-5 from it.

• Therefore you can copy the written materials needed in this course

yourself from the ’masters’ provided.

• Available in secretary Tarja Pihamaa’s room B326 in the Computer

Science and Eng. Building.

• Please return the masters ASAP after copying them!

• The exercises, their solutions, examination requirements, and lecture

slides will be copied to the participants via Edita Prima Oy.

• We need a contact person for that task.

Helsinki University of Technology, Espoo, Finland 9

Page 19: Macadamia: Master's Programme in Machine Learning and Data Mining

Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks

• Introduction to neural networks.

• Models and learning algorithms for a single neuron.

• Data preprocessing, Hebbian learning, and principal component

analysis.

• Multilayer perceptron networks and their learning algorithms.

• Model assessment and selection: generalization, validation, and

regularization.

• Radial-basis function networks.

• Support vector machines.

• Independent component analysis.

• Self-organizing maps and learning vector quantization.

• Processing of temporal information using feedforward and recurrent

networks.

Helsinki University of Technology, Espoo, Finland 12

Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks

Notices and enrollment

• Announcements concerning the course are given on the lectures and

exercises, and on the Web page of the course.

• The notice board of the course is in the 3rd floor of the T-building.

• The examination results appear there.

• Please enroll to the course using wwwtopi.

• If this is not possible by email to the lecturer.

• Put your name to the enrollment list circulating during the first

lecture(s), too.

Planned contents of the course

The following matters will be discussed in this course according to

current plans:

Helsinki University of Technology, Espoo, Finland 11

Page 20: Macadamia: Master's Programme in Machine Learning and Data Mining

T-61.5140 Machine Learning:Advanced Probablistic Methods

Jaakko Hollmen

Department of Information and Computer ScienceHelsinki University of Technology, Finland

e-mail: [email protected]: http://www.cis.hut.fi/Opinnot/T-61.5140/

January 17, 2008

Page 21: Macadamia: Master's Programme in Machine Learning and Data Mining

Course Material

Lecture slides and lectures! Lecture notes (aid the presentation on the lectures)! Lecture notes (contain extra material)

Course book! Christopher M. Bishop: Pattern Recognition and

Machine Learning, Springer, 2006! Chapters 8,9,10,11, and 13 covered during the course

Problem sessions! Problems and solutions! Demonstrations

Page 22: Macadamia: Master's Programme in Machine Learning and Data Mining

Passing the Course (5 ECTS credit points)

! Attend the lectures and the exercise sessions for bestlearning experience :-)

! Browse the material before attending the lectures andcomplete the exercises

! Complete the term project requiring solving of amachine learning problem by programming

! Pass the examination, next exam scheduled:Thursday, 15th of May, morning

! Requirements: passed exam and a acceptable termproject, bonus for active participation and excellentterm project (+1)

Note: Jaakko Hollmén will give a presentation on the term project tomorrow

Page 23: Macadamia: Master's Programme in Machine Learning and Data Mining

Topics covered on the course

Central topics! Random variables! Independence and conditional independence! Bayes’s rule! Naive Bayes classifier, finite mixture models,

k-means clustering! Expectation Maximization algorithm for inference

and learning! Computational algorithms for exact inference! Computational algorithms for approximate inference! Sampling techniques! Bayesian modeling

Page 24: Macadamia: Master's Programme in Machine Learning and Data Mining

CLUSTER Dual Degrees• Macadamia has agreements for a dual

degree currently with three other Master’s programmes in the CLUSTER network

• The students will spend 1 year in both

• Universitat Politècnica de Catalunya (UPC)

• Universidade Técnica de Lisboa, Instituto Superior Técnico (IST)

Page 25: Macadamia: Master's Programme in Machine Learning and Data Mining

Feedback from the First StudentsCredit points: ?, 20, 25, 27 out of 30“All goes well, the courses are all very interesting” “in general, everything is OK”“interest to work at the lab”“the layer between theory and running matlab toolboxes is missing” (Nikolaj’s course!)“some courses have more maths that I can handle at the moment, but this isn’t a bad thing”“some courses had overlapping schedules”

Page 26: Macadamia: Master's Programme in Machine Learning and Data Mining

More Information

http://www.cis.hut.fi/macadamia/

Coordinator Tapani Raiko

Page 27: Macadamia: Master's Programme in Machine Learning and Data Mining

See you in Helsinki!• Mining and Learning with Graphs (MLG) workshop,

July 4-5, 2008

• International Conference in Machine Learning (ICML), July 5-9, 2008

• Uncertainty in ArtificialIntelligence (UAI), July 9-12, 2008