machine learning para tertulianos, by javier ramirez at teowaki

Machine Learning para tertulianos, opinadores, y otra gente sin neuronas

Javier Ramirez@supercoco9

(o cómo poder seguir una conversación sobre machine learning enterándote de todo y empezar a usarlo en tus proyectos)

Any mario bros fans?

http://www.youtube.com/watch?v=_-Gc6diodcY&t=15

Name the tv show where these actors star

The super mario bros level and the celebrities you’ve

seen have been “imagined” by a computer

9

Makoto Koike designed a system to classify cucumbers by type and size at his parents’ farm.

He used a simple Tensorflow model running on a Raspberri Pi, connected to a server running also Tensorflow

Machine learning applied to real life

If Makoto could think of a real use case, you probably can find yours too.

Machine learning applied to real life

http://www.youtube.com/watch?v=4HCE1P-m1l8&t=10

This man, a construction worker in the West Bank settlement of Beitar Illit, near Jerusalem, posted on Facebook a picture of himself leaning against a bulldozer with the caption “یصبحھم”, or “attack them, hurt them”

Police officers arrested the man later that day, according to Israeli newspaper Haaretz, after they were notified of the post. They questioned him for several hours, suspicious he was planning to use the pictured bulldozer in a vehicle attack.

Only the translation was wrong

You probably think machine learning sounds

cool, but you need to be an expert to apply it

14

If you are in this talk...

SELECT LastName, FirstName, OrderID, OrderDate

FROM Employees e JOIN Orders o

ON e.EmployeeID = o.EmployeeID

WHERE

OrderDate < '1996-12-01' AND LastName < 'D'

Can you tell what this does?

You don’t need to be an expert on DB internals toefficiently use a database

16

...and the same is true for machine learning

Full code for social recommendations with ALS

Every ML framework has a ALS implementation

...you don’t need to know the maths (matrix factorization in this case) to build your recommendation engine

20

What is machine learning and why it’s relevant

21

In traditional programming, an engineer writes explicit, step-by-step instructions for the computer to follow.

With machine learning, programmers don't encode computers with instructions. They train them.

If you want to teach a neural network to recognize a cat, you don't tell it to look for whiskers, ears, fur, and eyes. You show it thousands and thousands of photos of cats, and eventually it works things out.

If it keeps misclassifying foxes as cats, you don't rewrite the code. You just keep coaching it.

Jason Tanz, wired 2016

Alan Turing wrote one of the first papers on the subject in 1950. In “Computing and Intelligence” he imagined a learning machine that could hold a written conversation and trick a human into thinking it was a real person.

… he called this “the imitation game”

24

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain. (...) Our hope is that there is so little mechanism in the child brain that something like it can be easily programmed. The amount of work in the education we can assume, as a first approximation, to be much the same as for the human child.

We have thus divided our problem into two parts. The child programme and the education process. These two remain very closely connected. We cannot expect to find a good child machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications

One may hope, however, that this process will be more expeditious than evolution. The survival of the fittest is a slow method for measuring advantages. The experimenter, by the exercise of intelligence, should he able to speed it up. Equally important is the fact that he is not restricted to random mutations. If he can trace a cause for some weakness he can probably think of the kind of mutation which will improve it.

How do machines learn?

27

● By memorizing (not very interesting, can only predict what you know)

● By being shown several examples of one thing, so we can identify what

the different examples have in common

● By trial and error, or by rewards/punishments

● By observing an expert doing a demonstration

● By being exposed to many different things, and trying to apply logic and

find patterns (as in Psycho-technical tests)

● And in many other ways, but this is just a convenient simplification :)

How do humans learn?

● By memorizing (not desirable in ML. We call this “overfitting”)

● By being shown several examples of one thing. We call this “Supervised

learning”

● By trial and error, or by rewards/punishments. We call this

“Reinforcement learning”

● By observing an expert doing a demonstration. We call this “Observational

learning or Apprenticeship learning”

● By being exposed to many different things. We call this “Unsupervised

learning”

How do machines learn?

In all the cases, we start from several inputs (or features, or examples), we train the system (or

model), and we ask questions (or evaluate) until we get to a point where the machine is answering

(or predicting, or inferring) correctly most of the time.

We repeat the cycle of training and evaluating for a ser number of times (or epochs), or until the

model is scoring a result close to perfect (the loss function is minimized)

If the model is not improving its answers for a few training cycles in a row (underfitting), we should

stop and think if we are using wrong inputs or an inappropriate training model.

The model should reason its answers. If it is just memorizing all the inputs (overfitting), it will be

useless when we try to use it with new data it has never seen.

But.. how do machines learn?

In his book “Outliers”, Malcolm Gladwell repeatedly mentions the "10,000-Hour Rule", claiming that

the key to achieving world-class expertise in any skill, is, to a large extent, a matter of practicing the

correct way, for a total of around 10,000 hours (though the authors of the original study this was

based on have disputed Gladwell's usage)

Machine learning systems need more input data that a human brain to learn and extract patterns,

but they can train very efficiently because they are able to try many different possibilities in a short

time. In effect they can reach the 10,000 hour rule in minutes, hours, or weeks.

Remember Neo learning martial arts in 10 hours training in the Matrix?

The secret behind ML

What kind of things can a machine learn?

32

Based on the input data, we can:

● Predict an amount, or predict the next in a sequence● Classify the inputs intro pre-defined categories (image recognition would fall here)● Group (cluster) the data by similarity, without predefined categories● Detect anomalies● Decide our next action (in a game, when driving a car…)● Create recommendations● Generate new data that would look like valid inputs● Compress or transcode the inputs● (... the list would be longer, but these are some of the most commons patterns)

Problems we can solve with ML

Overview of the most common algorithms

… to approach the problems we just saw

34

● Linear regression (also Bayesian)● Logistic regression (also Bayes point machines)● K-means clustering● Decision trees (and random forests)● Support Vector Machines● Alternating Least Squares (ALS)

● Neural networks of different types, in particular○ Single layer vs deep neural networks○ Feed-forward networks

■ Autoencoders■ Convolutional networks■ Generative adversarial networks

○ Recurrent networks (and Long Short Term Memory networks)

The most common algorithms

Problem:

I need to predict a continuous number based on several inputs

Algorithm:

Try to learn the “magic” combination of weights (or importance) for each of the inputs, that gives me always an output within a bias

Examples:

Dynamic pricing

Stock Market forecasts

Weather predictions

Demand forecast

Financial risk analysis

Linear regression

Problem:

I need to classify something in two (binary) or more (multi class) categories based on some inputs

Algorithm:

Find the “magic” combination of weights that produce each of the observed categories. Provide a probability of each category for a given set of inputs.

Examples:

Based on a text, decide if it’s spam

Based on a support ticket, to which department I should send it

Predict if a user will convert based on web navigation and age

Predict health diagnostics

Logistic regression

Problem:

I need to group things together, but I don’t know the categories (unsupervised). I want to group by similarity of inputs

Algorithm:

Decide how many clusters we want, get a random starting point (centroid) for each cluster and assign inputs that are similar. Recalculate the mean of the inputs on each cluster, repositioning the centroid. Repeat until the centroids don’t change. At that point every input will be in its most similar cluster

Examples:

Segment users

Group news by topics

Classify houses by value, location, and condition

Find the best location for a store

K-means clustering

Problem:

I want to predict an amount, or classify something. I know (supervised) the observed inputs and outputs

Decision trees

Algorithm:

Infer a decision tree where each step gives True or False for a range of input values

Examples:

Same examples we have seen for both linear regression and classification

Problem:

Decision trees might not be very accurate

Random forests

Algorithm:

Train several decision trees on the same domain, then run predictions on each and get the mean or the majority result.

This pattern is called “ensemble” and can be applied to any other ML model

Problem:

I want to predict an amount, or classify something. I know (supervised) the observed inputs and outputs

Support Vector Machines

Algorithm:

Divide the inputs in hyperplanes, so input data is on one side or another. Keep dividing on more hyperplanes for multi-class classification of for the possible values when predicting an amount

Problem:

I want to recommend items to users, based on the implicit or explicit ratings of other users

Alternating Least Squares

Algorithm:

Apply matrix factorization to produce a matrix where each item has a probability score for each user. The algorithm can detect latent features and it’s efficient with sparse data

Examples:

Netflix movie recommendation

Spotify artist recommendation

Amazon product recommendation

A neural network can do anything that can be done with the other algorithms, often with better results.

But the training time and complexity will be much higher

… so don’t use a neural network if there is a suitable simpler model

43

What’s a neural network

A calculating machine that tries to approximate a function based on the inputs to the system.

They are based on a collection of connected units called artificial neurons. Each connection between neurons can transmit a signal to another neuron. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream

Types of neural networksBy “shape” or number of layers:

Single Layer Neural Networks

Deep Neural Networks

More layers allow to learn more patterns. Hidden layers can be specialized depending on the task we are doing. In general all the neurons in one layer are fully connected to all neurons in the next layer.

Types of neural networksBy how data flows:

Feed Forward networks Recurrent Networks

They can take into consideration the output of a previous layer, so they are very good to predict sequences

Common feed forward neural networks (i)

Common feed forward neural networks (ii)Convolutional network

Common feed forward neural networks (iii)Generative Adversarial Network

Some applications of Recurrent networksAnything where a sequence is important:

● Speech recognition● Handwriting recognition● Music composition● Time series prediction● Protein homology detection● Automatic writing● Fake lip sync

Some applications of Recurrent networks

http://www.youtube.com/watch?v=9Yq67CjDqvw&t=200

So, to apply machine learning, I will work mostly on choosing and implementing a model, right?

...sure, and once I have a Photoshop my website is finished

54

Spend a huge amount of time preparing your training dataset

Bad input data Bad predictions

+ =Best algorithm ever

Some things that you need to do:● If you have a small dataset, you need to get more data

● Apply random transformations to your input data, so the system can learn better

● Make sure you have enough examples of good and bad data

● Label your examples (if you are doing supervised learning). This can be a huge task

● Transform your data so everything is a number (there are ready made methods for this, for example

word2vect, bag of words, and one-hot encoding)

● Remove outliers and incorrect data (lots of python libraries to help you here)

● Reduce the total number of features by combining or removing some of them (you can use Primary

Component Analysis or Factor Analysis for this)

● Engineer new features based on related inputs based on your human insights (for example, you

know the day of the week and the hour of the day are related if you are trying to predict demand)

And many more things. Be prepared to spend a huge amount of time curating your datasets

Can you trust your ML system?

57

Validation is very important.

Split your training data in three parts: training, testing, and validating

58

...and try to get insights from the predictions

Popular tools in the ecosystem

59

http://scikit-learn.org/

http://deeplearning.net/software/theano/ (deprecated)

http://torch.ch/

https://www.tensorflow.org/

https://www.microsoft.com/en-us/cognitive-toolkit/ (CNTK)

http://caffe.berkeleyvision.org/ and https://caffe2.ai/

https://keras.io/

https://mxnet.incubator.apache.org/

https://spark.apache.org/docs/latest/ml-guide.html

http://mahout.apache.org/

https://opencv.org/

https://developer.nvidia.com/cuda-downloads (not open source)

There are some of the most popular open source tools for ML.

You can also find hosted models (some very open, some proprietary) at AWS, Google Cloud Platform, MS Azure, and IBM Watson

http://scikit-learn.org/stable/

http://deeplearning.net/software/theano/

http://torch.ch/

https://www.tensorflow.org/

https://www.microsoft.com/en-us/cognitive-toolkit/

http://caffe.berkeleyvision.org/

https://caffe2.ai/

https://keras.io/

https://mxnet.incubator.apache.org/

https://spark.apache.org/docs/latest/ml-guide.html

http://mahout.apache.org/

https://opencv.org/

https://developer.nvidia.com/cuda-downloads

¡GRACIAS!


Si te has quedado con ganas de comentarme algo, mándame un tweet y hablamos después de mi charla

http://www.youtube.com/watch?v=V1eYniJ0Rnk

Machine Learning para tertulianos, opinadores, y otra gente sin neuronas


(o cómo poder seguir una conversación sobre machine learning enterándote de todo)

Template Design Credits

The Template provides a theme with four basic colors:

The backgrounds were created by Free Google Slides Templates.

The original template for this presentation was provided by, and it’s property of, Free Google Slides Templates - http://freegoogleslidestemplates.com

Vectorial Shapes in this Template were created by Free Google Slides Templates and downloaded from pexels.com and unsplash.com.

Icons in this Template are part of Google® Material Icons and 1001freedownloads.com.

Shapes & Icons Backgrounds

Fonts Color PaletteThe fonts used in this template are taken from Google fonts. ( Dosis,Open Sans )You can download the fonts from the following url: https://www.google.com/fonts/ #93c47dff #0097a7ff

#78909cff #eeeeeeff

#f7b600ff #00ce00e3

#de445eff #000000ff

http://freegoogleslidestemplates.com

https://www.google.com/fonts/

machine learning para tertulianos, by javier ramirez at teowaki

Technology