machine learning for mathematiciansdanielmckenzie.github.io/grad_student_seminar_spring_2018.pdf ·...

25
Machine Learning for Mathematicians Machine Learning for Mathematicians Daniel Mckenzie Tuesday March 6th, 2018

Upload: truongkhanh

Post on 06-Sep-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Machine Learning for Mathematicians

Daniel Mckenzie

Tuesday March 6th, 2018

Page 2: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Why should we care about Machine Learning

1 Necessary for non-academic jobs.

2 Can be useful in your research.

3 Your (future) students will need to know about it.

Page 3: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

An outline of this talk

1 Disambiguation of buzzwords.

2 Simple (yet effective) approaches.

3 Deep approaches.

4 Survey of applications.

5 Current research trends.

Page 4: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

What do we mean by Data?

1 Could be images, audio signals, stock prices, results of surveysetc.

2 Can always vectorize

Example (Greyscale Images)

Suppose each Ia is a 28× 28 array of pixel values. Each pixel valueis a number between 0 and 256 with 0 =black and 256 =white.Can think of each Ia as a 28× 28 matrix Aa

ij . Can make this avector by stacking: xa = [A11, . . . ,A1,28,A2,1, . . . ,A2,28, . . . ,A28,28]

More sophisticated approaches uses Fourier Transform orWavelets (Applied Harmonic Analysis). Each entry of vector iscalled a feature.

3 Three V’s of Big Data: Variety, Volume and Velocity.

Page 5: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Data Science, Machine Learning, and Artificial Intelligence1

1 Data Science: Produce insights from data for humans.

2 Machine Learning: Find a function f that predicts y frominput x . Eg f (Image) = cat. How f is doing this is oftenunclear.

3 Artificial Intelligence: Produce or recommend an actionfrom data. Eg AlphaGo, self-driving cars.

Caution: Distinction between ‘general’ AI (long wayoff/impossible) and ‘single purpose’ AI (AlphaGo, self-driving cars).

1Robinson 2018.

Page 6: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Data Science

Data Scientists use

1 Statistical know how to ‘wrangle’ complex data in a variety offormats into a clean, usable (vectorized) data set X .

2 Algorithms (Regression, Data Clustering, Neural Networksetc) to extract insights from X . E.g.: identify a trend/correlation, find outliers (fraud prevention), computequantities of interest (likelihood of certain type of consumerto renew cable contract).

3 Domain-specific knowledge to evaluate appropriateness of theabove.

to produce easily interpretable summaries (pie charts, reports,visualizations) to inform decision-making of other parties(management, sales team, R& D, government).

Page 7: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

(Supervised) Machine Learning

1 Model Problem: Identify people from pictures.

2 Key assumption: Let D be domain of interest (e.g. allpossible 28× 28 pictures). Let C be codomain of interest(e.g. the names of people we wish to identify). We assumethere exists a continuous function f ∗ : D→ C mapping allphotos of Dan to ‘Dan’ ∈ C.

3 Goal of Machine Learning: function approximation. Find anapproximation f # to f ∗.

4 Learning f #: Given training set X = {x1, . . . , xn} ⊂ D andknown labels Y = {y1, . . . , yn} ⊂ C. Find function f # suchthat f #(xi ) ≈ yi for all i .

Caution: Generalizability very important. Need to be confidentthat given x /∈ X f #(x) ≈ f ∗(x)

Page 8: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Artificial Intelligence

This slide intentionally left blank

Page 9: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Simple Approaches to Machine Learning2

1 Let P be a class of ‘easy functions’ (e.g. piecewisepolynomial). Find f # as:

f # = argmin{L(f ,X ) : f ∈ P}

Think L(f ,X ) =∑n

i=1 ‖f (xi )− yi‖2. Regression, Splines,Finite Elements.

2 K -Nearest Neighbours. LetNK (x) = {xi1 , . . . , xiK nearest to x}. Compute

f #(x) = 1K

∑Kj=1 yij .

3 Support Vector Machine.

4 Decision trees.

2Goodfellow et al. 2016.

Page 10: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Simple Approaches: Logistic Regression

Suppose |C| = 2 e.g. C = {‘Dan’, ’Not Dan’}. Define sigmoid/logistic function g(u) = 1/ (1 + e−u). Look for f # of the form:

fw(x) =

{‘Dan’ if g(w>x) ≈ 1

‘not Dan’ if g(w>x) ≈ 0

That is, f # = argmin{L(fw,X ) : w ∈ Rn}. Can think ofz = g(w>x) as probability that the image contains Dan.

Figure: Schematic depiction of Perceptron

Page 11: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

(Shallow) Neural Networks

Essentially iterated Logistic Regression:

Figure: Schematic depiction of 2-layer Neural Network

Page 12: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

(Shallow) Neural Networks cont.

Notation: fW denotes Neural Network with weightsW = {w1, . . . ,w5}. fW(x) = z.Typically, z1 = probability x in class 1, z2 = probability x in class 2.Architecture: Choice of number of layers and neurons per layer.Activation function: g . Many other choices, but must benon-linear!.These layers are fully connected.Need to find good W. will vectorize: w = [w1,w2, . . . ,w5].Need to solve f # = argmin{L(fw,X ) : w ∈ Rn1 × Rn2 × . . .Rn5}

Page 13: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Gradient Descent

1 The problem: Find minimum of F : Rm → R. Can assumethat F is differentiable.

2 Know that −∇F (w) ⊂ Rm points in direction of steepestdecrease of F at x.

3 Gradient Descent Algorithm: wk+1 = wk − ε∇F (wk).

4 For Neural Networks: F (w) = L(fw,X ) (Think:L(fw,X ) =

∑ni=1 ‖fw(xi )− yi‖2). Randomly initialize w0.

Compute wk+1 using gradient descent until ‘good enough’.

5 Issue 1: Computing ∇L can be costly (typically useStochastic Gradient Descent).

6 Issue 2: L is usually (highly) non-convex. No guarantee thatGradient Descent will converge.

Page 14: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Skills necessary for ML

For Undergrads

1 Coursework: Multivariable calculus, Linear Algebra, NumericalAnalysis, Probability.

2 Online resources: http://cs229.stanford.edu/,https://www.coursera.org/learn/machine-learning

Additional resources for Grads

1 Coursework: Harmonic Analysis, Image Processing, Statistics.

2 Some programming.

3 Deep learning book: http://www.deeplearningbook.org/

4 18.657: Graduate Course on Mathematics of MachineLearning taught at MIT (all materials/lecture notes availableonline)

5 Blogs: http://nuit-blanche.blogspot.com/

Page 15: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Deep Neural Networks

1 Key Insight: Vectorizing/ feature extraction is the mostimportant step.

2 Many techniques from Applied Harmonic Analysis (e.g.Wavelets, Curvelets,. . . ) could be used.

3 Deep Learning: Use many convolutional layers to extractgood, problem specific features. Then use a few, fullyconnected layers to classify.

4 Hinton, Osindero, and Teh 2006 3 was first to show this wasfeasible.

5 Krizhevsky, Sutskever, and Hinton 2012 presented a Deep NNhalving previous error rate for image classification

6 Key Drivers of DL: Increased processing power (GPU’s).Large training sets (sourced from the internet).

3Geoff Hinton is the great-great-grandson of George Boole, inventor ofBoolean logic.

Page 16: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Deep Neural Networks 4

4Figure from:https://developingideas.me/deepneuralnetworkoverview/

Page 17: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Prototypical Applications of Machine Learning

1 Handwritten Digit Classification State-of-the-art algorithmsare > 99.75% accurate.

2 Automated Captioning: Given an image, algorithm shouldoutput brief sentence describing what is going on .

3 Natural Language Processing: Alexa, Siri et. al.Sentiment Analysis.

Figure: First two pictures from Karpathy and Fei-Fei 2015

Page 18: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Current Research Trends

1 Dealing with Data scarcity.

2 Regularization and priors.

3 Transfer Learning: Getting a neural network trained to do onething (e.g. play ‘Pong’) to learn to do another thing quickly(e.g. play ‘Seaquest’) (see Fernando et al. 2017).

4 What the hell is actually going on here? Still not clear howdeep neural networks do what they do. This leaves themsusceptible to manipulation (adversarial attacks).

Page 19: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

An Adversarial Patch5

5Brown et al. 2017.

Page 20: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Applications to Mathematics: Data Driven DynamicalSystems6

1 For many physical/ biological systems of interest:x(t) = f (x(t)).

2 Can usually collect historical data via observation:X = {x(t1), x(t2), . . . , x(tn)} andY = {x(t1), x(t2), . . . , x(tn)}

3 Model: Assume that f (x(t)) is a sparse linear combinationof elementary functions ϕ1, . . . , ϕN (e.g. polynomials, trig.functions etc)

4 Use Machine Learning to find an optimal f # =∑N

i=1 aiϕi .(Strong connections with Compressive Sensing).

6Brunton, Proctor, and Kutz 2016.

Page 21: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Application to Mathematics: Predicting Hodge Numbers

1 Let WP4 be weighted projective space.

2 Large finite number of 3-dim Calabi Yau Ma ⊂WP4. Eachcut out by a degree w =

∑4i=0 wi homogeneous polynomial.

3 Of interest to string theorists to compute Hodge numbers hi ,j

4 To each Ma associate the data vector xa of coefficients ofdefining polynomial7.

5 Training set: X = {x1, . . . , xm} andY = {h2,1(M1), . . . , h2,1(Mm)}.

6 In (He 2017), Neural Network was trained in above dataset topredict whether h2,1(M) large ( > 50) or not large (≤ 50)given M.

7 They report 94.4% accuracy on unseen data.

7Plus possibly some side info like χ

Page 22: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

Thanks! Any questions?

Figure: Neural Style Transfer from Johnson, Alahi, and Fei-Fei 2016

Page 23: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

References I

Brown, Tom B et al. (2017). “Adversarial patch”. In: arXivpreprint arXiv:1712.09665.

Brunton, Steven L, Joshua L Proctor, and J Nathan Kutz (2016).“Discovering governing equations from data by sparseidentification of nonlinear dynamical systems”. In: Proceedingsof the National Academy of Sciences 113.15, pp. 3932–3937.

Fernando, Chrisantha et al. (2017). “Pathnet: Evolution channelsgradient descent in super neural networks”. In: arXiv preprintarXiv:1701.08734.

Goodfellow, Ian et al. (2016). Deep learning. Vol. 1. MIT pressCambridge.

He, Yang-Hui (2017). “Deep-learning the landscape”. In: arXivpreprint arXiv:1706.02714.

Page 24: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

References II

Hinton, Geoffrey E, Simon Osindero, and Yee-Whye Teh (2006).“A fast learning algorithm for deep belief nets”. In: Neuralcomputation 18.7, pp. 1527–1554.

Johnson, Justin, Alexandre Alahi, and Li Fei-Fei (2016).“Perceptual losses for real-time style transfer andsuper-resolution”. In: European Conference on ComputerVision. Springer, pp. 694–711.

Karpathy, Andrej and Li Fei-Fei (2015). “Deep visual-semanticalignments for generating image descriptions”. In: Proceedingsof the IEEE conference on computer vision and patternrecognition, pp. 3128–3137.

Page 25: Machine Learning for Mathematiciansdanielmckenzie.github.io/Grad_Student_Seminar_Spring_2018.pdf · Machine Learning for Mathematicians What do we mean by Data? 1 Could be images,

Machine Learning for Mathematicians

References III

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton (2012).“Imagenet classification with deep convolutional neuralnetworks”. In: Advances in neural information processingsystems, pp. 1097–1105.

Robinson, David (2018). What’s the difference between datascience, machine learning, and artificial intelligence?http://varianceexplained.org. Blog.