theme%introduc.on%:%% - university of...

18
Theme Introduc.on : Learning from Data Dr Gavin Brown Machine Learning and Op.miza.on Research Group

Upload: lamthien

Post on 26-Jul-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Theme  Introduc.on  :      

Learning  from  Data  

Dr  Gavin  Brown  Machine  Learning  and  Op.miza.on  Research  Group  

Page 2: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Learning  from  Data  

Where  does  all  this  fit?          

           

Ar.ficial  Intelligence                  

                     

Sta.s.cs  /  Mathema.cs  

Computer  Vision      

Data  Mining      

 Learning  from  Data        

Robo.cs    

(No  defini.on  of  a  field  is  perfect  –  the  diagram  above  is  just  one  interpreta.on,  mine  ;-­‐)  

Page 3: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Learning  from  Data  The  world  is  drowning  in  data.  

 

Book  sales  :  Amazon  makes  250,000  sales/deliveries  per  day  Gene+cs  :  100,000  genes  sequenced  while-­‐u-­‐wait  (almost)  

Search  :  ~10  billion  Google  Images  /  48hrs  per  min  uploaded  to  YouTube  Health  records  :  NHS  plan  to  have  60m  electronic  records  in  place  by  2015  

   

This  theme  studies  algorithms  that  enable  us  to  extract  meaning  from  data.    

   

 

 

Page 4: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Learning  from  Data  

Data  is  recorded  from  some  real-­‐world  phenomenon.  What  might  we  want  to  do  with  that  data?    Predic+on  

 -­‐  what  can  we  predict  about  this  phenomenon?    Descrip+on  

 -­‐  how  can  we  describe/understand  this  phenomenon  in  a  new  way?    

 

 

 

Page 5: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Predic+on        Descrip+on      

   

 

 

Period  1  Oct/Nov    

Period  2  Nov/Dec    

COMP61021  Modeling  &  Visualiza.on  of  High  Dimensional  Data  

COMP61011  Founda.ons  of  Machine  Learning  

Lecturer:  Dr  Gavin  Brown  

Page 6: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Machine  Learning  and  Data  Mining  

Spam  emails  How  can  we  predict  if  something  is  spam/genuine?        

     

       

   

 

 

Page 7: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Machine  Learning  and  Data  Mining  

Medical  Records  /  Novel  Drugs  What  characteris.cs  of  a  pa.ent  indicate  they  may  react  well/badly  to  a  new  drug?  How  can  we  predict  whether  it  will  poten.ally  hurt  rather  then  help  them?  

     

         

 

 

 

Page 8: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Building  “Models”  of  the  Data  

Model  

Learning  Algorithm  

HISTORICAL HEALTH RECORDS

x1 x2 Label 98.7 157.6 1 93.6 138.8 0 42.8 171.9 0 92.8 154.5 1

Predicted  Health  Status   x1 x2 85.2, 160.3

1 (healthy)  

Page 9: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Building  “Models”  of  the  Data  

Model   (Week  1,  9am)  

(Weeks  3-­‐4)  

Page 10: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Lecturer:  Dr  Ke  Chen  

Predic+on        Descrip+on      

   

 

 

Period  1  Oct/Nov    

Period  2  Nov/Dec    

COMP61021  Modeling  &  Visualiza.on  of  High  Dimensional  Data  

COMP61011  Founda.ons  of  Machine  Learning  

Page 11: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Modeling  and  Visualiza.on  of  High  Dimensional  Data  

Gene  Maps  The  human  body  has  about  24,000  ac.ve  genes  –  soon  you  will  be  able  to  buy  your  own  gene  map  for  a  few  hundred  pounds.    How  can  we  visualize  this?        

     

       

       

Page 12: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Modeling  and  Visualiza.on  of  High  Dimensional  Data  

Image  processing  Gesture  recogni.on  –  how  can  we  represent  the  mo.on  of  a  human  with  so  many  complex  joints  and  angles?        

     

       

       

Page 13: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Pre-­‐requisite  knowledge  

(week  1,  9am)  

•  Vectors    •  Matrix  proper+es,  e.g.  determinant,  rank,  inverse    •  Vector  Space  proper+es,  e.g.  orthonormal  basis  •  Eigenvectors  and  Eigenvalues  •  Matrix  Calculus,  e.g.  deriva?ves  in  matrix  form  •  Op+misa+on  basics,  e.g.  Lagrange  mul?pliers    

Page 14: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Learning  from  Data  …..  Prerequisites  

MATHEMATICS  This  is  a  mathema+cal  subject.    You  must  be  comfortable  with  probabili+es  and  algebra.  

 PROGRAMMING  

You  must  be  able  to  program,  and  pick  up  a  new  language  rela.vely  easily.  We  provide  support  for  Matlab.  

           

           

http://studentnet.cs.manchester.ac.uk/pgt/COMP61011 http://studentnet.cs.manchester.ac.uk/pgt/COMP61021        

Page 15: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Matlab  

MATrix  LABoratory  

•   Interac.ve  scrip.ng  language  •   Interpreted  (i.e.  no  compiling)  •   Objects  possible,  not  compulsory  •   Dynamically  typed  •   Flexible  GUI  /  plolng  framework  •   Large  libraries  of  tools  •   Highly  op.mized  for  maths  

Available  free  from  Uni,  but  usable  only  when  connected  to  our  network  (e.g.  via  VPN)    Module-­‐specific  soYware  supported  on  school  machines  only.  

Page 16: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Learning  from  Data  …..  Why  NOT  to  do  this!  

 1.   If  you  don’t  like  maths.  

 61011  is  reasonably  challenging.    61021  is  HARD.    Another  valid  name  for  machine  learning  is  “Computa.onal  Sta.s.cs”.  

2.   If  you  are  not  a  confident  programmer.    This  is  an  MSc  in  computer  science.  You  HAVE  to  be  able  to  code  well.    You  are  highly  likely  to  fail  this  unit  if  you  cannot.      People  did  last  year.  

3.   If  you  have  the  “I  want  to  use  machine  learning  to  do  X”  syndrome    This  is  a  real  technical  subject.  It’s  not  magic.    

   BTW…  You  will  learn  nothing  about  “Big  Data”,  or  how  to  deal  with  it    

Page 17: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Syllabus    

•  Linear  Models  •  Support  Vector  Machines  •  Nearest  Neighbour  Methods  •  Decision  Trees  •  Combining  Models  -­‐  ensemble  methods,  mixtures  of  experts,  boos.ng  •  Feature  Selec.on  •  Probabilis.c  Classifiers  and  Bayes  Theorem  •  Algorithm  assessment  -­‐  overfilng,  generalisa.on,  comparing  two  algorithms  

•   Background/introduc.on  •   Mathema.cs  Basics  •   Principal  component  analysis  (PCA)  •   Linear  discrimina.ve  analysis  (LDA)  •   Self-­‐organising  map  (SOM)  •   Mul.-­‐dimensional  scaling  (MDS)  •   Isometric  feature  mapping  (ISOMAP)  •   Locally  linear  embedding  (LLE)  

COMP61011  (Founda.ons  of  Machine  Learning)  

COMP61021  (Modeling  and  Visualizing  High  Dimensional  Data)  

Page 18: Theme%Introduc.on%:%% - University of Manchestersyllabus.cs.manchester.ac.uk/pgt/2017/COMP61011/materials/ThemeSpotlight.pdf · Data$is%recorded%from%some%realPworld%phenomenon.%

Textbooks    Not  compulsory  purchase.  Notes  will  be  provided  in  class.  

“Introduc+on  to  Machine  Learning”  By  Ethem  Alpaydin