bigdata@chalmers machine learning business intelligence ... · machine learning business...

44
BigData@Chalmers Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms, Computational Biology) D&IT Chalmers

Upload: others

Post on 08-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

BigData@Chalmers

Machine Learning

Business Intelligence,

Culturomics and Life Sciences

Devdatt Dubhashi

LAB

(Machine Learning. Algorithms, Computational Biology)

D&IT Chalmers

Page 2: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 3: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 4: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 5: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 6: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Entity Disambiguation

• Match names in text with the entity behind them

• Fundamental problem, addressed at annual

competitions like Semeval

• Disambiguation is needed everywhere.

Databases, web mining, linguistics, …

• Used at Recorded Future (exemplified next!)

Page 7: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 8: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 9: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 10: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 11: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 12: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

“Judge a man by the company he keeps.”- Euripides

Page 13: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Chris Anderson

IndiaOxford Uni.

Pakistan

TED Future Publishing

San Francisco

Page 14: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Chris Anderson

Page 15: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Graph Communities

Page 16: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Classification with Graph

Kernels

Page 17: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Graph Embeddings and Kernels

12

3

4 5

1√ϑ

• Embed discrete combinatorial object (graph) into continuous Euclidean space

• Define kernel based on geometry of Euclidean sp.

• V. Jethava et al NIPS 2012, JMLR 2013

• T. Kerola, L. Hermansson, V. Jethava, F. Johansson CIKM 2013

• F. Johansson, V. Jethavaet al ICML 2014.

Page 18: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Demonstrator at Recorded

Future• Classifies names as ambiguous or unique

• Uses graph classification to classify occurrence graphs of names

• Achieved state-of-the-art results (CIKM, 2013).

• Powerful extension for complete disambiguation in progress …

• Parallel/Distributed implementation in GraphLab

Ambiguous or

Unique?

Page 19: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Towards a knowledge-based

culturomics

• Språkbanken (Swedish Language Bank),

University of Gothenburg

• Language Technology, Lund University

• LAB Group Department of Computer

Science and Engineering, Chalmers

University of Technology

Page 20: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 21: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Word Embeddings

Page 22: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 23: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Deep Learning (Neural

Networks)• Revolutionized vision

and speech systems

• Dramatic

improvements in

image classification –

near human level.

• Skype real time

translation from

English to Chinese.

Page 24: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Word Embeddings capture

meaning

Page 25: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Dealing with

information overload

Page 26: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Document summarization

Word vectors

+ Multiple

Kernel

learning

+ Submodular

optimization M. Kågeback, O. Mogren

et al,

“Extractive Summarization

using Continuous Vector

Space Models”,

Workshop on

(CVSC) EACL 2014

Page 27: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 28: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 29: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Word sense induction

him political her government god influence state came us act labour given

council about authority

energy unit system battery x performance

high allows engine equipment processing systems failure

management provide

Instance cloud for: 'power'M. Kageback, F. Johansson et

al, “Neural context embeddings

for automatic discovery of word

senses”, (NAACL 2015

workshop on Vector Space

Modeling for NLP)

Used an innovative clustering

technique

Exploited word and context

vectors.

Page 30: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Senses of for ‘paper’

Vis using t-sne

Medium

Essay

Scholarly article

Newspaper

Newspaper firm

Material

Page 31: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 32: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 33: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 34: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,
Page 35: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Probabilistic Regulation of

Metabolism• Prediction of

metabolic changes

due to genetic or

environmental

perturbations

• diagnosing metabolic

disorders

• discovering novel

drug targets.

Page 36: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Genetic Regulation of Metabolism:

Using Factor Graphs and Belief

Propagation

Page 37: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

genetic regulatory network

consisting of transcription factor

genes, target genes and metabolic

reactions

Page 38: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Privacy

• Data mining with

Differential Privacy

• Programming

language technology

for differential privacy

(Sands)

• Privacy policies for

social networks

(Schneider)

Page 39: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Chalmers Machine Learning

Summer School 2015

Page 40: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Big Data Analytics May 25-29

• Hadoop

• Spark

• Spotfire

Page 41: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

• SVMs and Kernel Methods

• Graph Theoretic Methods

• Probabilistic Graphical Models

• Deep Learning

• Bayesian Decision Theory

• Reinforcement Learning

• Business intelligence

• Natural Language Technology

• Life Sciences

• Transport (Volvo)

• Infectious disease epidemiology

• Medical Imaging

• Political Science …

Page 42: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Machine Learning

Probability and

Statistics

Algorithms

Optimization

Database

s

Multicores

/GPUs

Securit

yPrivac

y

Data Science

Parallel

programming

Sparse

modelling

Page 43: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Chalmers Data-X ?

• Life Science and

Engineering

• Transport

• Energy

• Smart Cities (Built

Environment)

• Production

• Volvo cars (connected

cars, historical data)

• AstraZeneca (mining

medical literature)

• Seal software (mining

legal contracts)

Page 44: BigData@Chalmers Machine Learning Business Intelligence ... · Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms,

Data Science vs EScience

• Data-centric

• Probabilistic models

• GPUs

• Computational

biology, NLP, social

sciences …

• Computation-centric

• Simulation

• Large clusters/grids

• Physics, Turbulent

flows, Climate …