data science docker• docker basics • docker for data science environments • connecting your...

Post on 12-Jun-2020

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Confidential / 13 August 2017

3 August 2017

Data Science ❤DockerDockerize Your Data Science Environment

Confidential / 23 August 2017

Outlook

• About Me, Detego and RFID

• Docker Basics

• Docker for Data Science Environments

• Connecting your Data Science Environment to other Services

Confidential / 33 August 2017

About Me, Detego and RFID

Florian Geigl

PhD from Institute of Interactive Systems and Data Science, Graz University of Technology

Short-Term Scholar at Information Science Institute, University of Southern California

Working as Data Scientist at Detego

Attended Kaggle Competitions

Latin: reveal, uncover, display

located in Graz

~35 employees

Fashion-Retail Industry

International Customers

Confidential / 43 August 2017

About Me, Detego and RFID

Confidential / 53 August 2017

www.detego.com

Confidential / 63 August 2017

3 August 2017

Docker Basics

Confidential / 73 August 2017

https://www.docker.com/what-docker

…”Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers.”…

“everything required to make a piece of software run is packaged into isolated containers. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.”

Confidential / 83 August 2017

• Consistent Environments • Linux, MacOS, Windows

• AWS, Azure & many more

• Native Performance• + CUDA version

• Resources Saving

• Easy Configuration• Pre-build/official Images

• or custom Docker Image

• Easy Mounting of Data

Why Docker?

Confidential / 93 August 2017

How fast can you set up an apache server?Switch between apache versions?

Set up an identical apache server on Linux, Mac & Windows?

Confidential / 103 August 2017

Live Demo: Apache

docker run

-it

--rm

-p 8888:80

-v C:\path\to\data:/usr/local/apache2/htdocs/

httpd

Image != Container

Image == Class

Container == Instance

Confidential / 113 August 2017

Confidential / 123 August 2017

Confidential / 133 August 2017

Docker…

• has basically no overhead

• provides native performance

• provides a consistent environment

• allows you to build your own docker image

• runs on any host OS

• allows to easily mount data into a container

• starts instantly

• …

Confidential / 143 August 2017

3 August 2017

Building a Docker Data Science Environment

Confidential / 153 August 2017

Building your own Docker Images

e.g.: Ubuntu & vim

“Dockerfile”

FROM ubuntu:latest

RUN apt-get updates && apt-get install vim

docker build .

-> results in a docker image

Confidential / 163 August 2017

Docker Data Science Image

Based on Kaggle’s Docker Image: https://hub.docker.com/u/kaggle/

Open-Source: https://github.com/floriangeigl/docker-DataScience

- pull requests are highly welcome

Contained Services:

- Python (2&3)

- R

- Julia

- Jupyter Notebooks

- Jupyter Labs

- RStudio

“docker pull floriangeigl/datascience”

(-> pulls or updates an image)

Confidential / 173 August 2017

Do It – Do It Now!

docker run --rm -it -p 8888:8888 -p 8889:8889 -p 8787:8787 -p 2222:22 –p 9001:9001 -v "${pwd}:/data/" --name dsdocker floriangeigl/datascience /bin/bash

docker run: Create a container from an image and executes a given command

--rm: Remove the container after shutdown

-p: Map a port from the container to our host machine

(e.g.: HostPort:ContainerPort)

-v: Mount a directory into the container

(e.g.: HostPath:ContainerPath)

pwd = print working directory = current path

floriangeigl/datascience: Docker image

/bin/bash: Executed command

Image != Container

Image == Class

Container == Instance

Confidential / 183 August 2017

Live Demo: Data Science Container

8888: jupyter notebooks

8889: jupyter labs

8787: r-studio-server

22: ssh

9001: supervisord (status of services; restart services; logs…)

Confidential / 193 August 2017

Best Practice #1: Aliases

Win Powershell:

run “notepad $PROFILE”

add “function dsdocker {docker run --rm -i -t -p … -v "${pwd}:/data/“ …}

restart Powershell & use your new “dsdocker” command

Linux&Mac:

add an alias for the command

${pwd} -> $(pwd)

-> see: https://github.com/floriangeigl/docker-DataScience

Confidential / 203 August 2017

Best Practice #2 – Fixed Project Structure

• Cookiecutter: https://github.com/drivendata/cookiecutter-data-science

Confidential / 213 August 2017

Known Bugs

Issues with de-keyboard:

Can’t type “\” on german keyboard in Chrome & IE

https://github.com/jupyter/notebook/issues/2379#issuecomment-301268937

-> workaround: use Firefox

Confidential / 223 August 2017

3 August 2017

Connecting to other Services

Confidential / 233 August 2017

Databases anyone?

Get your hands dirty on various technologies

https://hub.docker.com/u/library/

Confidential / 243 August 2017

Docker-Compose

version: "3.1"

services:

datascience:

image: floriangeigl/datascience:latest

ports:

- "8888:8888"

- "8889:8889"

- "8787:8787"

- "9001:9001"

volumes:

- ./:/data/

links:

- mongo

- cassandra

mongo:

image: mongo:latest

# persistent storage

volumes:

- ./data/mongo/:/data/db

cassandra:

image: cassandra:latest

Accessible ports:

8888, 8889, 8787,

….

Hostname: mongo Hostname: cassandra

Confidential / 253 August 2017

Do It – Do It Now!

Go to /path/to/compose-file

Run “docker-compose(.exe) up”

Use the stack

Shutdown: Strg+C

Remove used containers: “docker-compose(.exe) rm”

Confidential / 263 August 2017

Questions?

top related