workflow around modelling in data science / r
TRANSCRIPT
Workflow around modelling in Data Science / R
Marek RogalaCTO @ Appsilon Data Science
eRum 2016, Poznań
Quick intro
Mission: transform how businesses use their data by combining top-notch data science and software engineering.
What inspires us:
• Transfer knowledge across industries.
• Get a solution into production quickly & iterate.
• Effectively gather experts knowledge to make models thrive.
Paweł PrzytułaData Engineering Lead
Filip StachuraCo-founder, CEO
Olga MierzwaData Scientist
Marek RogalaCTO & Co-founder
Krzysiek WróbelData Scientist
The problem
Data Science projects get disorganized rapidly.
Need to reproduce, compare, reuse past results.
• crunch the data• build models • evaluate results• translate into insights• gather domain knowledge
Lightweight Data Science Workflow
There’s no one-size-fits-all.
• Unified structure for projects.• Not constrain ourselves too much.• Self documenting, no extra code.• Reproduce past results.• Communicate with semi-technical stakeholders.
Dataflows
• Several workflows in a project• Workflow = a series of steps• Each workflow is parametrized• Single, long lived R session
What we’re working on
• Seamless 100% reproducibility based on Git• Caching of intermediate results• Support multiple languages
Similar tools
• github.com/appsilon/dataflows-workflow
• Remake (github.com/richfitz/remake)• Ruigi (github.com/kirillseva/ruigi)• Flowr (github.com/sahilseth/flowr)• Drake (github.com/Factual/drake)• ...