machine learning for mathematiciansdanielmckenzie.github.io/grad_student_seminar_spring_2018.pdf ·...

Post on 06-Sep-2018

233 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning for Mathematicians

Machine Learning for Mathematicians

Daniel Mckenzie

Tuesday March 6th, 2018

Machine Learning for Mathematicians

Why should we care about Machine Learning

1 Necessary for non-academic jobs.

2 Can be useful in your research.

3 Your (future) students will need to know about it.

Machine Learning for Mathematicians

An outline of this talk

1 Disambiguation of buzzwords.

2 Simple (yet effective) approaches.

3 Deep approaches.

4 Survey of applications.

5 Current research trends.

Machine Learning for Mathematicians

What do we mean by Data?

1 Could be images, audio signals, stock prices, results of surveysetc.

2 Can always vectorize

Example (Greyscale Images)

Suppose each Ia is a 28× 28 array of pixel values. Each pixel valueis a number between 0 and 256 with 0 =black and 256 =white.Can think of each Ia as a 28× 28 matrix Aa

ij . Can make this avector by stacking: xa = [A11, . . . ,A1,28,A2,1, . . . ,A2,28, . . . ,A28,28]

More sophisticated approaches uses Fourier Transform orWavelets (Applied Harmonic Analysis). Each entry of vector iscalled a feature.

3 Three V’s of Big Data: Variety, Volume and Velocity.

Machine Learning for Mathematicians

Data Science, Machine Learning, and Artificial Intelligence1

1 Data Science: Produce insights from data for humans.

2 Machine Learning: Find a function f that predicts y frominput x . Eg f (Image) = cat. How f is doing this is oftenunclear.

3 Artificial Intelligence: Produce or recommend an actionfrom data. Eg AlphaGo, self-driving cars.

Caution: Distinction between ‘general’ AI (long wayoff/impossible) and ‘single purpose’ AI (AlphaGo, self-driving cars).

1Robinson 2018.

Machine Learning for Mathematicians

Data Science

Data Scientists use

1 Statistical know how to ‘wrangle’ complex data in a variety offormats into a clean, usable (vectorized) data set X .

2 Algorithms (Regression, Data Clustering, Neural Networksetc) to extract insights from X . E.g.: identify a trend/correlation, find outliers (fraud prevention), computequantities of interest (likelihood of certain type of consumerto renew cable contract).

3 Domain-specific knowledge to evaluate appropriateness of theabove.

to produce easily interpretable summaries (pie charts, reports,visualizations) to inform decision-making of other parties(management, sales team, R& D, government).

Machine Learning for Mathematicians

(Supervised) Machine Learning

1 Model Problem: Identify people from pictures.

2 Key assumption: Let D be domain of interest (e.g. allpossible 28× 28 pictures). Let C be codomain of interest(e.g. the names of people we wish to identify). We assumethere exists a continuous function f ∗ : D→ C mapping allphotos of Dan to ‘Dan’ ∈ C.

3 Goal of Machine Learning: function approximation. Find anapproximation f # to f ∗.

4 Learning f #: Given training set X = {x1, . . . , xn} ⊂ D andknown labels Y = {y1, . . . , yn} ⊂ C. Find function f # suchthat f #(xi ) ≈ yi for all i .

Caution: Generalizability very important. Need to be confidentthat given x /∈ X f #(x) ≈ f ∗(x)

Machine Learning for Mathematicians

Artificial Intelligence

This slide intentionally left blank

Machine Learning for Mathematicians

Simple Approaches to Machine Learning2

1 Let P be a class of ‘easy functions’ (e.g. piecewisepolynomial). Find f # as:

f # = argmin{L(f ,X ) : f ∈ P}

Think L(f ,X ) =∑n

i=1 ‖f (xi )− yi‖2. Regression, Splines,Finite Elements.

2 K -Nearest Neighbours. LetNK (x) = {xi1 , . . . , xiK nearest to x}. Compute

f #(x) = 1K

∑Kj=1 yij .

3 Support Vector Machine.

4 Decision trees.

2Goodfellow et al. 2016.

Machine Learning for Mathematicians

Simple Approaches: Logistic Regression

Suppose |C| = 2 e.g. C = {‘Dan’, ’Not Dan’}. Define sigmoid/logistic function g(u) = 1/ (1 + e−u). Look for f # of the form:

fw(x) =

{‘Dan’ if g(w>x) ≈ 1

‘not Dan’ if g(w>x) ≈ 0

That is, f # = argmin{L(fw,X ) : w ∈ Rn}. Can think ofz = g(w>x) as probability that the image contains Dan.

Figure: Schematic depiction of Perceptron

Machine Learning for Mathematicians

(Shallow) Neural Networks

Essentially iterated Logistic Regression:

Figure: Schematic depiction of 2-layer Neural Network

Machine Learning for Mathematicians

(Shallow) Neural Networks cont.

Notation: fW denotes Neural Network with weightsW = {w1, . . . ,w5}. fW(x) = z.Typically, z1 = probability x in class 1, z2 = probability x in class 2.Architecture: Choice of number of layers and neurons per layer.Activation function: g . Many other choices, but must benon-linear!.These layers are fully connected.Need to find good W. will vectorize: w = [w1,w2, . . . ,w5].Need to solve f # = argmin{L(fw,X ) : w ∈ Rn1 × Rn2 × . . .Rn5}

Machine Learning for Mathematicians

Gradient Descent

1 The problem: Find minimum of F : Rm → R. Can assumethat F is differentiable.

2 Know that −∇F (w) ⊂ Rm points in direction of steepestdecrease of F at x.

3 Gradient Descent Algorithm: wk+1 = wk − ε∇F (wk).

4 For Neural Networks: F (w) = L(fw,X ) (Think:L(fw,X ) =

∑ni=1 ‖fw(xi )− yi‖2). Randomly initialize w0.

Compute wk+1 using gradient descent until ‘good enough’.

5 Issue 1: Computing ∇L can be costly (typically useStochastic Gradient Descent).

6 Issue 2: L is usually (highly) non-convex. No guarantee thatGradient Descent will converge.

Machine Learning for Mathematicians

Skills necessary for ML

For Undergrads

1 Coursework: Multivariable calculus, Linear Algebra, NumericalAnalysis, Probability.

2 Online resources: http://cs229.stanford.edu/,https://www.coursera.org/learn/machine-learning

Additional resources for Grads

1 Coursework: Harmonic Analysis, Image Processing, Statistics.

2 Some programming.

3 Deep learning book: http://www.deeplearningbook.org/

4 18.657: Graduate Course on Mathematics of MachineLearning taught at MIT (all materials/lecture notes availableonline)

5 Blogs: http://nuit-blanche.blogspot.com/

Machine Learning for Mathematicians

Deep Neural Networks

1 Key Insight: Vectorizing/ feature extraction is the mostimportant step.

2 Many techniques from Applied Harmonic Analysis (e.g.Wavelets, Curvelets,. . . ) could be used.

3 Deep Learning: Use many convolutional layers to extractgood, problem specific features. Then use a few, fullyconnected layers to classify.

4 Hinton, Osindero, and Teh 2006 3 was first to show this wasfeasible.

5 Krizhevsky, Sutskever, and Hinton 2012 presented a Deep NNhalving previous error rate for image classification

6 Key Drivers of DL: Increased processing power (GPU’s).Large training sets (sourced from the internet).

3Geoff Hinton is the great-great-grandson of George Boole, inventor ofBoolean logic.

Machine Learning for Mathematicians

Deep Neural Networks 4

4Figure from:https://developingideas.me/deepneuralnetworkoverview/

Machine Learning for Mathematicians

Prototypical Applications of Machine Learning

1 Handwritten Digit Classification State-of-the-art algorithmsare > 99.75% accurate.

2 Automated Captioning: Given an image, algorithm shouldoutput brief sentence describing what is going on .

3 Natural Language Processing: Alexa, Siri et. al.Sentiment Analysis.

Figure: First two pictures from Karpathy and Fei-Fei 2015

Machine Learning for Mathematicians

Current Research Trends

1 Dealing with Data scarcity.

2 Regularization and priors.

3 Transfer Learning: Getting a neural network trained to do onething (e.g. play ‘Pong’) to learn to do another thing quickly(e.g. play ‘Seaquest’) (see Fernando et al. 2017).

4 What the hell is actually going on here? Still not clear howdeep neural networks do what they do. This leaves themsusceptible to manipulation (adversarial attacks).

Machine Learning for Mathematicians

An Adversarial Patch5

5Brown et al. 2017.

Machine Learning for Mathematicians

Applications to Mathematics: Data Driven DynamicalSystems6

1 For many physical/ biological systems of interest:x(t) = f (x(t)).

2 Can usually collect historical data via observation:X = {x(t1), x(t2), . . . , x(tn)} andY = {x(t1), x(t2), . . . , x(tn)}

3 Model: Assume that f (x(t)) is a sparse linear combinationof elementary functions ϕ1, . . . , ϕN (e.g. polynomials, trig.functions etc)

4 Use Machine Learning to find an optimal f # =∑N

i=1 aiϕi .(Strong connections with Compressive Sensing).

6Brunton, Proctor, and Kutz 2016.

Machine Learning for Mathematicians

Application to Mathematics: Predicting Hodge Numbers

1 Let WP4 be weighted projective space.

2 Large finite number of 3-dim Calabi Yau Ma ⊂WP4. Eachcut out by a degree w =

∑4i=0 wi homogeneous polynomial.

3 Of interest to string theorists to compute Hodge numbers hi ,j

4 To each Ma associate the data vector xa of coefficients ofdefining polynomial7.

5 Training set: X = {x1, . . . , xm} andY = {h2,1(M1), . . . , h2,1(Mm)}.

6 In (He 2017), Neural Network was trained in above dataset topredict whether h2,1(M) large ( > 50) or not large (≤ 50)given M.

7 They report 94.4% accuracy on unseen data.

7Plus possibly some side info like χ

Machine Learning for Mathematicians

Thanks! Any questions?

Figure: Neural Style Transfer from Johnson, Alahi, and Fei-Fei 2016

Machine Learning for Mathematicians

References I

Brown, Tom B et al. (2017). “Adversarial patch”. In: arXivpreprint arXiv:1712.09665.

Brunton, Steven L, Joshua L Proctor, and J Nathan Kutz (2016).“Discovering governing equations from data by sparseidentification of nonlinear dynamical systems”. In: Proceedingsof the National Academy of Sciences 113.15, pp. 3932–3937.

Fernando, Chrisantha et al. (2017). “Pathnet: Evolution channelsgradient descent in super neural networks”. In: arXiv preprintarXiv:1701.08734.

Goodfellow, Ian et al. (2016). Deep learning. Vol. 1. MIT pressCambridge.

He, Yang-Hui (2017). “Deep-learning the landscape”. In: arXivpreprint arXiv:1706.02714.

Machine Learning for Mathematicians

References II

Hinton, Geoffrey E, Simon Osindero, and Yee-Whye Teh (2006).“A fast learning algorithm for deep belief nets”. In: Neuralcomputation 18.7, pp. 1527–1554.

Johnson, Justin, Alexandre Alahi, and Li Fei-Fei (2016).“Perceptual losses for real-time style transfer andsuper-resolution”. In: European Conference on ComputerVision. Springer, pp. 694–711.

Karpathy, Andrej and Li Fei-Fei (2015). “Deep visual-semanticalignments for generating image descriptions”. In: Proceedingsof the IEEE conference on computer vision and patternrecognition, pp. 3128–3137.

Machine Learning for Mathematicians

References III

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton (2012).“Imagenet classification with deep convolutional neuralnetworks”. In: Advances in neural information processingsystems, pp. 1097–1105.

Robinson, David (2018). What’s the difference between datascience, machine learning, and artificial intelligence?http://varianceexplained.org. Blog.

top related