parameter server on flink - github pagesparameter server on flink an approach for model-parallel...

74
Parameter Server on Flink an approach for model-parallel machine learning 27/09/2018 Distributed Computing and Analytics Workshop Dániel Berecz [email protected] This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 688191.

Upload: others

Post on 20-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Parameter Server on Flinkan approach for model-parallel machine learning

27/09/2018

Distributed Computing and Analytics Workshop

Dániel [email protected]

This project has received funding from the European Union’s Horizon 2020research and innovation program under grant agreement No 688191.

About us

• Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI)

• Data Science group

• Strong industry ties

• Ericsson, Bosch, Portugal Telekom, etc.

Agenda

1. Model-parallel training

2. Parameter Server on Flink Streaming

Model-parallel training

𝑅

Recommendation with matrix factorization

5

1

3

5

2

0

0

0

0

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

min𝑢∗,𝑖∗

(𝑝,𝑞)∈𝜅𝑅

𝑟𝑝𝑞 − 𝜇 − 𝑏𝑝 − 𝑏𝑞 − 𝑢𝑝𝑖𝑞2+

+𝜆

𝑝∈𝜅𝑈

( 𝑢𝑝2+ 𝑏𝑝

2) + 𝜆

𝑞∈𝜅𝐼

( 𝑖𝑞2+ 𝑏𝑞

2)

Dániel rated Iron Man with 5 stars

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

?

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

Would Gábor like Interstellar?

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

?

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

Would Gábor like Interstellar?

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

?

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

?

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

3

𝑅

Recommendation with matrix factorization

𝑈𝑈 ∙ 𝐼 ≈ 𝑅

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of drama

X factor

3

0

0

0

0

Latent factors

Dániel

Gábor

Iron Man Interstellar

Dániel rated Iron Man with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

3

𝑅

Matrix factorization training

𝑈

item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

0

0

0

0

Dániel

Gábor

Iron Man Interstellar

4

2

𝑅

Matrix factorization training

𝑈

item vector

326

532

5 -6 -2

5 4 -4

5

1

3

uservector

5

2

4

0

0

0

0

Dániel

Gábor

Iron Man Interstellar

𝑅

Matrix factorization training

𝑈

item vector

135

532

5 -6 -2

6 1 2

5

1

3

uservector

5

2

4

0

0

0

0

Dániel

Gábor

Iron Man Interstellar

𝑅

Matrix factorization training

𝑈

item vector

135

532

4 -5 -1

6 1 2

5

1

3

uservector

5

2

3

0

0

0

0

Dániel

Gábor

Iron Man Interstellar

model-parallel

Parameter Server on Flink

Parameter Server

workers

server nodes

data

pull push

Parameter Server API

ps.pull(paramId)ps.push(paramId, paramUpdate) workers

server nodes

data

pull push

Parameter Server API

workers

server nodes

data

pull push

ps.pull(paramId)ps.push(paramId, paramUpdate)

def onRecv(data): Unit

Parameter Server API

workers

server nodes

data

pull push

ps.pull(paramId)ps.push(paramId, paramUpdate)

def onRecv(data): Unitdef onPullRecv(paramId, paramValue)

Matrix factorization withParameter Server

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 3

0

0

0

0

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 3

0

0

0

0

𝑅

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5

0

0

0

0

𝐼

𝑈

𝑅

2

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

pull

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

pull answer

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

push

𝑅𝑈

325

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

135

532

5 -6 -1

5 4 -4

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix factorization withParameter Server

workers

server nodes

𝑅𝑈

325

532

5 -6 -1

6 1 2

5

1

3

5 4

0

0

0

0

2

𝐼

𝑈

𝑅

Matrix Factorization code (SGD)

def onRecv(r: Rating) = { def onPullRecv(paramId: Int,param: Vector) = {

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)

def onPullRecv(paramId: Int,param: Vector) = {

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

val (r, userId, _) =waitQueues(itemId).pop()

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

val (r, userId, _) =waitQueues(itemId).pop()

val user = users(userId)

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

val (r, userId, _) =waitQueues(itemId).pop()

val user = users(userId)val (userDelta, itemDelta) =

updateWithSGD(user, item, r)

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

val (r, userId, _) =waitQueues(itemId).pop()

val user = users(userId)val (userDelta, itemDelta) =

updateWithSGD(user, item, r)users(userId) += userDelta

Matrix Factorization code (SGD)

def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)

}

def onPullRecv(paramId: Int,param: Vector) = {

val itemId = paramIdval item = param

val (r, userId, _) =waitQueues(itemId).pop()

val user = users(userId)val (userDelta, itemDelta) =

updateWithSGD(user, item, r)users(userId) += userDeltaps.push(itemId, itemDelta)

}

Integration with Flink

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation?

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation?

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

?

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

WIP:FLIP-15FLIP-16

Implementation

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation

Parameter Server API

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation

Parameter Server API

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation

Parameter Server API

Pushsameparam?

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Implementation

Parameter Server API

Pushsameparam?

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Write conflict!

Implementation

Parameter Server API

Pushsameparam?

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Write conflict!Synchronization?

Implementation

Parameter Server API

Pushsameparam?

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Write conflict!Asynchronous training!

Framework and library

• Framework• Easy to implement new algorithms

• Library• Matrix Factorization

• Factorization Machine

• Passive Aggressive

• Sketch

Thank you for your attention

Dániel Berecz

@DBerecz

[email protected]

Gábor [email protected]

Source code:https://github.com/FlinkML/flink-parameter-server

https://github.com/rpalovics/Alpenglow

M. Li, et al.: "Scaling Distributed Machine Learning with the Parameter Server" 2014.

K. Crammer, et al.: “Online Passive-Aggressive Algorithms” 2006.

S. Schelter, et al.: "Factorbird - A Parameter Server Approach to Distributed Matrix Factorization." 2014.

R. Gemulla, et al. "Large-scale matrix factorization with distributed stochastic gradient descent” 2011.

Backups

Batch? Streaming?

Batch vs streaming?

Batch vs streaming?Offline vs online?

Online on streaming

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Online on streaming

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

Batch + online combination

Batch + online combination

• 30M music listening Last.fm dataset

• Weekly batch training

• Evaluation weekly average• on every incoming listening

• Around 45.000 users

Batch on Flink Streaming

• Movielens 1M movie rating dataset

• Using 6 nodes, 4 cores each

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

Implementation: Loops API

Parameter Server API

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

Implementation: Loops API

Parameter Server API

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

busy

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

busy

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

busy

busy

Implementation: Loops API

Parameter Server API

Pull, Push

Answerpull

DataSource Transformations DataSink

DataStream of input data

DataStream of Worker and PS

output

Worker

Parameter Server

NOT MATURE

busy

busy

deadlock