workflow around modelling in data science / r

10
Workflow around modelling in Data Science / R Marek Rogala CTO @ Appsilon Data Science eRum 2016, Poznań

Upload: appsilon-data-science

Post on 13-Apr-2017

94 views

Category:

Software


4 download

TRANSCRIPT

Workflow around modelling in Data Science / R

Marek RogalaCTO @ Appsilon Data Science

eRum 2016, Poznań

Quick intro

Mission: transform how businesses use their data by combining top-notch data science and software engineering.

What inspires us:

• Transfer knowledge across industries.

• Get a solution into production quickly & iterate.

• Effectively gather experts knowledge to make models thrive.

Paweł PrzytułaData Engineering Lead

Filip StachuraCo-founder, CEO

Olga MierzwaData Scientist

Marek RogalaCTO & Co-founder

Krzysiek WróbelData Scientist

The problem

Data Science projects get disorganized rapidly.

Need to reproduce, compare, reuse past results.

• crunch the data• build models • evaluate results• translate into insights• gather domain knowledge

Bring order to chaos!

Fast experimentation

versus

reproducibility and reusability.

Lightweight Data Science Workflow

There’s no one-size-fits-all.

• Unified structure for projects.• Not constrain ourselves too much.• Self documenting, no extra code.• Reproduce past results.• Communicate with semi-technical stakeholders.

Dataflows – sample project

Dataflows

• Several workflows in a project• Workflow = a series of steps• Each workflow is parametrized• Single, long lived R session

What we’re working on

• Seamless 100% reproducibility based on Git• Caching of intermediate results• Support multiple languages

Similar tools

• github.com/appsilon/dataflows-workflow

• Remake (github.com/richfitz/remake)• Ruigi (github.com/kirillseva/ruigi)• Flowr (github.com/sahilseth/flowr)• Drake (github.com/Factual/drake)• ...

Marek Rogala✉️ [email protected]

Invest in your workflow!