theme%introduc.on%:%% - university of...
TRANSCRIPT
Theme Introduc.on :
Learning from Data
Dr Gavin Brown Machine Learning and Op.miza.on Research Group
Learning from Data
Where does all this fit?
Ar.ficial Intelligence
Sta.s.cs / Mathema.cs
Computer Vision
Data Mining
Learning from Data
Robo.cs
(No defini.on of a field is perfect – the diagram above is just one interpreta.on, mine ;-‐)
Learning from Data The world is drowning in data.
Book sales : Amazon makes 250,000 sales/deliveries per day Gene+cs : 100,000 genes sequenced while-‐u-‐wait (almost)
Search : ~10 billion Google Images / 48hrs per min uploaded to YouTube Health records : NHS plan to have 60m electronic records in place by 2015
This theme studies algorithms that enable us to extract meaning from data.
Learning from Data
Data is recorded from some real-‐world phenomenon. What might we want to do with that data? Predic+on
-‐ what can we predict about this phenomenon? Descrip+on
-‐ how can we describe/understand this phenomenon in a new way?
Predic+on Descrip+on
Period 1 Oct/Nov
Period 2 Nov/Dec
COMP61021 Modeling & Visualiza.on of High Dimensional Data
COMP61011 Founda.ons of Machine Learning
Lecturer: Dr Gavin Brown
Machine Learning and Data Mining
Spam emails How can we predict if something is spam/genuine?
Machine Learning and Data Mining
Medical Records / Novel Drugs What characteris.cs of a pa.ent indicate they may react well/badly to a new drug? How can we predict whether it will poten.ally hurt rather then help them?
Building “Models” of the Data
Model
Learning Algorithm
HISTORICAL HEALTH RECORDS
x1 x2 Label 98.7 157.6 1 93.6 138.8 0 42.8 171.9 0 92.8 154.5 1
Predicted Health Status x1 x2 85.2, 160.3
1 (healthy)
Building “Models” of the Data
Model (Week 1, 9am)
(Weeks 3-‐4)
Lecturer: Dr Ke Chen
Predic+on Descrip+on
Period 1 Oct/Nov
Period 2 Nov/Dec
COMP61021 Modeling & Visualiza.on of High Dimensional Data
COMP61011 Founda.ons of Machine Learning
Modeling and Visualiza.on of High Dimensional Data
Gene Maps The human body has about 24,000 ac.ve genes – soon you will be able to buy your own gene map for a few hundred pounds. How can we visualize this?
Modeling and Visualiza.on of High Dimensional Data
Image processing Gesture recogni.on – how can we represent the mo.on of a human with so many complex joints and angles?
Pre-‐requisite knowledge
(week 1, 9am)
• Vectors • Matrix proper+es, e.g. determinant, rank, inverse • Vector Space proper+es, e.g. orthonormal basis • Eigenvectors and Eigenvalues • Matrix Calculus, e.g. deriva?ves in matrix form • Op+misa+on basics, e.g. Lagrange mul?pliers
Learning from Data ….. Prerequisites
MATHEMATICS This is a mathema+cal subject. You must be comfortable with probabili+es and algebra.
PROGRAMMING
You must be able to program, and pick up a new language rela.vely easily. We provide support for Matlab.
http://studentnet.cs.manchester.ac.uk/pgt/COMP61011 http://studentnet.cs.manchester.ac.uk/pgt/COMP61021
Matlab
MATrix LABoratory
• Interac.ve scrip.ng language • Interpreted (i.e. no compiling) • Objects possible, not compulsory • Dynamically typed • Flexible GUI / plolng framework • Large libraries of tools • Highly op.mized for maths
Available free from Uni, but usable only when connected to our network (e.g. via VPN) Module-‐specific soYware supported on school machines only.
Learning from Data ….. Why NOT to do this!
1. If you don’t like maths.
61011 is reasonably challenging. 61021 is HARD. Another valid name for machine learning is “Computa.onal Sta.s.cs”.
2. If you are not a confident programmer. This is an MSc in computer science. You HAVE to be able to code well. You are highly likely to fail this unit if you cannot. People did last year.
3. If you have the “I want to use machine learning to do X” syndrome This is a real technical subject. It’s not magic.
BTW… You will learn nothing about “Big Data”, or how to deal with it
Syllabus
• Linear Models • Support Vector Machines • Nearest Neighbour Methods • Decision Trees • Combining Models -‐ ensemble methods, mixtures of experts, boos.ng • Feature Selec.on • Probabilis.c Classifiers and Bayes Theorem • Algorithm assessment -‐ overfilng, generalisa.on, comparing two algorithms
• Background/introduc.on • Mathema.cs Basics • Principal component analysis (PCA) • Linear discrimina.ve analysis (LDA) • Self-‐organising map (SOM) • Mul.-‐dimensional scaling (MDS) • Isometric feature mapping (ISOMAP) • Locally linear embedding (LLE)
COMP61011 (Founda.ons of Machine Learning)
COMP61021 (Modeling and Visualizing High Dimensional Data)
Textbooks Not compulsory purchase. Notes will be provided in class.
“Introduc+on to Machine Learning” By Ethem Alpaydin