big data e deep learning - uniroma1.itispac.diet.uniroma1.it/scardapane/wp-content/...big data e...

Simone Scardapane {[email protected]}

Big Data e Deep Learning

Verso una nuova generazione di

programmi intelligenti (forse)

A cosa servono i dati?

1

https://datafloq.com/read/understanding-sources-big-data-infographic/338

https://datafloq.com/read/understanding-sources-big-data-infographic/338

Google Flu Trends

2

The big data hope

2

“…we can accurately estimate the current level of weekly influenza activity in

each region of the United States, with a reporting lag of about one day.”

Ginsberg, Jeremy, et al. "Detecting influenza epidemics using search engine query data."

Nature 457.7232 (2009): 1012-1014.

Inizialmente, i ricercatori stimavano di poterlo ottenere con un modello

fondamentalmente lineare fra queries Q e visite dal medico P:

logit P = 𝛽0 + 𝛽1 × logit Q + ϵ

The big data hubris

2

“GFT […] missed by a very large margin in the 2011–2012 flu season and

has missed high for 100 out of 108 weeks starting with August 2011.”

"“Big data hubris” is the often implicit assumption that big data are a

substitute for, rather than a supplement to, traditional data collection and

analysis."

Lazer, David, et al. "The parable of Google Flu: traps in big data analysis."

Science 343.14 March (2014).

Predictive police

2

“PredPol, [is a] “predictive policing” software program that shovels historical

crime data through a proprietary algorithm and spits out the 10 to 20 spots

most likely to see crime over the next shift.”

"Santa Cruz saw burglaries drop by 11% and robberies by 27% in the first

year of using the software."

Server And Protect: Predictive Policing Firm PredPol Promises To Map Crime Before It Happens

(Forbes, 2015)

http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing/

A cosa può servire?

2

Possiamo usare questi dati per predire cosa scriveranno gli utenti?

Swiftkey Releases Predictive Keyboard Built On A Neural Network

http://digitalcallout.com/how-much-data-do-we-generate-every-day/

http://techcrunch.com/2015/10/08/swiftkey-releases-predictive-keyboard-built-on-a-neural-network/

http://digitalcallout.com/how-much-data-do-we-generate-every-day/

Image recognition

2

Microsoft, Google Beat Humans at Image Recognition

http://www.eetimes.com/document.asp?doc_id=1325712

Ok, ma come?

Machine learning

2

Questa è un'anatra:

https://it.wikipedia.org/wiki/Anas_platyrhynchos#/media/File:Anas_platyrhynchos_male.jpg

Questa NON è un'anatra:

https://it.wikipedia.org/wiki/Quercus#/media/File:Quercus_pubescens_Tuscany.jpg

Come farlo capire

al computer?

https://it.wikipedia.org/wiki/Anas_platyrhynchos#/media/File:Anas_platyrhynchos_male.jpg

https://it.wikipedia.org/wiki/Quercus#/media/File:Quercus_pubescens_Tuscany.jpg

Reti neurali artificiali

2http://neuralnetworksanddeeplearning.com/chap1.html

http://neuralnetworksanddeeplearning.com/chap1.html

Una ispirazione biologica

2

Un elemento essenziale:

strati multipli di

elaborazione

Urbanski, M., Coubard, O. A., & Bourlon, C. (2014). Visualizing the blind brain: brain imaging of visual field defects from

early recovery to rehabilitation techniques. Frontiers in integrative neuroscience, 8.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4179723/

Una (brevissima) storia delle reti neurali

2

• 1957: Frank Rosenblatt presenta il percettrone

• Anni '70: "AI Winter"

• Anni '80: la prima "rinascita" delle reti neurali

• Parziale abbandono fino al 2000

• Dal 2006: deep networks, la seconda "rinascita"

Fattori scatenanti

2

1. Nuovi algoritmi per allenare reti con vari strati nascosti

(inizializzazione unsupervised, ecc.).

2. Training set di svariati milioni di elementi ("big data").

3. Grandi capacità computazionali: clusters, GPU, ecc.

E Google?

2

Google è uno dei massimi esponenti nel campo:

• 2012: allena una rete neurale con oltre 1 miliardo di parametri su

frame estratti da YouTube

• 2014: acquista DeepMind per un costo stimato di $ 500 milioni

• 2015: rilascia il framework di machine learning distribuito

TensorFlow in open source

Il "neurone dei gatti"

2Le, Q. V. (2013, May). Building high-level features using large scale unsupervised learning. In 2013 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), (pp. 8595-8598). IEEE.

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6639343

"We built a cat detector!"

2

Strati di rappresentazione

2

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436-444.

http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

Cosa vede una deep network?

2

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV

2014 (pp. 818-833). Springer International Publishing.

http://link.springer.com/chapter/10.1007/978-3-319-10590-1_53

Deep dreams

2

The best images from Google's Deep Dream software

http://www.dazeddigital.com/artsandculture/gallery/20085/4/the-best-images-from-google-s-deep-dream-software

Ingannare una rete neurale

2

Nguyen A, Yosinski J & Clune J. Deep Neural

Networks are Easily Fooled: High Confidence

Predictions for Unrecognizable Images. In

Computer Vision and Pattern Recognition

(CVPR ’15), IEEE, 2015.

http://arxiv.org/pdf/1412.1897v4.pdf

E noi?

Strutturare i dati

2https://cloud.google.com/prediction/docs/developer-guide#trainingtheapi

https://cloud.google.com/prediction/docs/developer-guide#trainingtheapi

Prediction API


ID univoco assegnato al

modello

Percorso del file di training

nel Cloud Storage


Richiedere una predizione


Prediction prediction = new Prediction(httpTransport,

requestInitializer, jsonFactory);

Input input = new Input();

InputInput inputInput = new InputInput();

inputInput.setCsvInstance(params);

input.setInput(inputInput);

Output output = prediction.trainedmodels().predict(modelId

input).execute();


Il futuro (forse)

Macchine che si guidano da sole?

2

https://www.google.com/selfdrivingcar/

https://www.google.com/selfdrivingcar/

Macchine che "parlano"?

2

http://googleresearch.blogspot.it/2014/11/a-picture-is-worth-thousand-coherent.html

Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2014). Show and tell: A neural image caption generator.

arXiv preprint arXiv:1411.4555.

http://googleresearch.blogspot.it/2014/11/a-picture-is-worth-thousand-coherent.html

http://arxiv.org/abs/1411.4555

15

Big data e deep learning

Simone ScardapaneGDG L-ab Member

PhD Student @ La Sapienza

[email protected]

< Grazie dell’Attenzione! >

big data e deep learning - uniroma1.itispac.diet.uniroma1.it/scardapane/wp-content/...big data e...

Documents