distributed machine learning with zero etl · © 2018 gridgain systems, inc. distributed machine...

16
© 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

Upload: others

Post on 21-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Distributed Machine Learning with Zero ETL

Yury Babak

Head of development, GridGain

Page 2: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Long ETL

Page 3: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Long ETL

- Х%

- Х%

Page 4: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Distributed Training

Page 5: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Node Crash

Page 6: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Apache Ignite

Page 7: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Apache Ignite: Replicated Caches

Server Node 1 Server Node 2

Server Node 3 Server Node 4

Client

Page 8: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Map Reduce

Page 9: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Iterative Optimization Algorithm

Page 10: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Partition Based Data Set

Page 11: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Restoration of partitions after a failure

Page 12: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Recovering calculations after failure

Page 13: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

OLS sample

Loss function

Gradient of loss function

Node 2Node 1Node M

Page 14: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Sample 2 LSQR

Page 15: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Limitations of Applicability

Iteration time

Number of Iterations

SGDBS 1 000

BS 10

Time to training

Page 16: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

https://ignite.apache.org

https://apacheignite.readme.io/docs

https://github.com/apache/ignite

[email protected]

Want to learn more?