parameter server on flink - github pagesparameter server on flink an approach for model-parallel...
TRANSCRIPT
Parameter Server on Flinkan approach for model-parallel machine learning
27/09/2018
Distributed Computing and Analytics Workshop
Dániel [email protected]
This project has received funding from the European Union’s Horizon 2020research and innovation program under grant agreement No 688191.
About us
• Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI)
• Data Science group
• Strong industry ties
• Ericsson, Bosch, Portugal Telekom, etc.
𝑅
Recommendation with matrix factorization
5
1
3
5
2
0
0
0
0
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
min𝑢∗,𝑖∗
(𝑝,𝑞)∈𝜅𝑅
𝑟𝑝𝑞 − 𝜇 − 𝑏𝑝 − 𝑏𝑞 − 𝑢𝑝𝑖𝑞2+
+𝜆
𝑝∈𝜅𝑈
( 𝑢𝑝2+ 𝑏𝑝
2) + 𝜆
𝑞∈𝜅𝐼
( 𝑖𝑞2+ 𝑏𝑞
2)
Dániel rated Iron Man with 5 stars
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
?
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
Would Gábor like Interstellar?
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
?
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
Would Gábor like Interstellar?
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
?
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
Would Gábor like Interstellar?
5 4 -4
325
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
?
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
Would Gábor like Interstellar?
5 4 -4
325
3
𝑅
Recommendation with matrix factorization
𝑈𝑈 ∙ 𝐼 ≈ 𝑅
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
2
Level of actionLevel of drama
X factor
3
0
0
0
0
Latent factors
Dániel
Gábor
Iron Man Interstellar
Dániel rated Iron Man with 5 stars
Would Gábor like Interstellar?
5 4 -4
325
3
𝑅
Matrix factorization training
𝑈
item vector
325
532
5 -6 -1
5 4 -4
5
1
3
uservector
5
0
0
0
0
Dániel
Gábor
Iron Man Interstellar
4
2
𝑅
Matrix factorization training
𝑈
item vector
326
532
5 -6 -2
5 4 -4
5
1
3
uservector
5
2
4
0
0
0
0
Dániel
Gábor
Iron Man Interstellar
𝑅
Matrix factorization training
𝑈
item vector
135
532
5 -6 -2
6 1 2
5
1
3
uservector
5
2
4
0
0
0
0
Dániel
Gábor
Iron Man Interstellar
𝑅
Matrix factorization training
𝑈
item vector
135
532
4 -5 -1
6 1 2
5
1
3
uservector
5
2
3
0
0
0
0
Dániel
Gábor
Iron Man Interstellar
model-parallel
Parameter Server API
ps.pull(paramId)ps.push(paramId, paramUpdate) workers
server nodes
data
pull push
Parameter Server API
workers
server nodes
data
pull push
ps.pull(paramId)ps.push(paramId, paramUpdate)
def onRecv(data): Unit
Parameter Server API
workers
server nodes
data
pull push
ps.pull(paramId)ps.push(paramId, paramUpdate)
def onRecv(data): Unitdef onPullRecv(paramId, paramValue)
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 3
0
0
0
0
𝑅
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5
0
0
0
0
𝐼
𝑈
𝑅
2
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
pull
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
pull answer
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
push
𝑅𝑈
325
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
135
532
5 -6 -1
5 4 -4
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix factorization withParameter Server
workers
server nodes
𝑅𝑈
325
532
5 -6 -1
6 1 2
5
1
3
5 4
0
0
0
0
2
𝐼
𝑈
𝑅
Matrix Factorization code (SGD)
def onRecv(r: Rating) = { def onPullRecv(paramId: Int,param: Vector) = {
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)
def onPullRecv(paramId: Int,param: Vector) = {
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
val (r, userId, _) =waitQueues(itemId).pop()
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
val (r, userId, _) =waitQueues(itemId).pop()
val user = users(userId)
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
val (r, userId, _) =waitQueues(itemId).pop()
val user = users(userId)val (userDelta, itemDelta) =
updateWithSGD(user, item, r)
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
val (r, userId, _) =waitQueues(itemId).pop()
val user = users(userId)val (userDelta, itemDelta) =
updateWithSGD(user, item, r)users(userId) += userDelta
Matrix Factorization code (SGD)
def onRecv(r: Rating) = {waitQueues(r.itemId).add(r)ps.pull(r.itemId)
}
def onPullRecv(paramId: Int,param: Vector) = {
val itemId = paramIdval item = param
val (r, userId, _) =waitQueues(itemId).pop()
val user = users(userId)val (userDelta, itemDelta) =
updateWithSGD(user, item, r)users(userId) += userDeltaps.push(itemId, itemDelta)
}
Integration with Flink
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation?
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation?
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
?
Implementation: Loops API
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation: Loops API
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
WIP:FLIP-15FLIP-16
Implementation
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation
Parameter Server API
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation
Parameter Server API
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation
Parameter Server API
Pushsameparam?
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Implementation
Parameter Server API
Pushsameparam?
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Write conflict!
Implementation
Parameter Server API
Pushsameparam?
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Write conflict!Synchronization?
Implementation
Parameter Server API
Pushsameparam?
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Write conflict!Asynchronous training!
Framework and library
• Framework• Easy to implement new algorithms
• Library• Matrix Factorization
• Factorization Machine
• Passive Aggressive
• Sketch
Thank you for your attention
Dániel Berecz
@DBerecz
Gábor [email protected]
Source code:https://github.com/FlinkML/flink-parameter-server
https://github.com/rpalovics/Alpenglow
M. Li, et al.: "Scaling Distributed Machine Learning with the Parameter Server" 2014.
K. Crammer, et al.: “Online Passive-Aggressive Algorithms” 2006.
S. Schelter, et al.: "Factorbird - A Parameter Server Approach to Distributed Matrix Factorization." 2014.
R. Gemulla, et al. "Large-scale matrix factorization with distributed stochastic gradient descent” 2011.
Online on streaming
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Online on streaming
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
Batch + online combination
• 30M music listening Last.fm dataset
• Weekly batch training
• Evaluation weekly average• on every incoming listening
• Around 45.000 users
Implementation: Loops API
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
Implementation: Loops API
Parameter Server API
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
Implementation: Loops API
Parameter Server API
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
busy
Implementation: Loops API
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
busy
Implementation: Loops API
Parameter Server API
Pull, Push
Answerpull
DataSource Transformations DataSink
DataStream of input data
DataStream of Worker and PS
output
Worker
Parameter Server
NOT MATURE
busy
busy