build your own data challenge, or just organize team work

20
RAMP-WORKFLOW & RAMP-KITS 1 Université Paris-Saclay / CNRS BALÁZS KÉGL Center for Data Science Paris-Saclay

Upload: balazs-kegl

Post on 22-Jan-2018

211 views

Category:

Data & Analytics


0 download

TRANSCRIPT

RAMP-WORKFLOW & RAMP-KITS

1

Université Paris-Saclay / CNRSBALÁZS KÉGL

Center for Data ScienceParis-Saclay

2

Most classical data challenges are HR and publicity events

3

We decided to turn them into a tool for

1. Collaborative prototyping 2. Teaching aid 3. Data science team management

4

We are open sourcing it

toolkit: https://github.com/paris-saclay-cds/ramp-workflow

examples: https://github.com/ramp-kits

5

Funded by Université Paris-Saclay

6

RAMP.STUDIO DATA CHALLENGE WITH CODE SUBMISSION

Center for Data ScienceParis-SaclayB. Kégl (CNRS) 7

what you achieved with a well tuned deep net

the diversity gap

the human blender gap

competitive phase

collaborative phase

THE POWER OF THE (COLLABORATING) CROWD

Center for Data ScienceParis-SaclayB. Kégl (CNRS)

OPEN PHASE LETS PARTICIPANTS CATCH UP THE GOAL OF TEACHING

8

Center for Data ScienceParis-SaclayB. Kégl (CNRS) 9

COMMUNICATION AND REUSE

10

You can

1. Participate in upcoming RAMPs 2. Use RAMP in teaching or training

11

Setting up the RAMP is was

long and hard.

12

Separate workflow building and workflow optimization

13

Before solving the problem, set it up

(even put it into production)

• toolkit: https://github.com/paris-saclay-cds/ramp-workflow

• for designing workflows

• set of ready-made metrics, workflows, CV schemes, data readers

• unique command-line test script

• examples: https://github.com/ramp-kits

• a zoo of problems, experiments, workflows

• (at least) one initial solution

14

RAMP-WORKFLOW & RAMP-KITS

Center for Data ScienceParis-SaclayB. Kégl (CNRS)

CLASSIFYING AND REGRESSING ON MOLECULAR SPECTRA

15

chemotherapy drug in elastic pocket

laser spectrometer

molecular spectra

feature extractor 1

feature extractor 2

regressor

concentration

classifier

drug type

Center for Data ScienceParis-SaclayB. Kégl (CNRS)

FORECASTING EL NINO SIX MONTHS AHEAD

16

… 300.14 299.83 298.76 299.87 299.82 300.15 300.10 299.50… …

time series feature extractor

x (a fixed length feature vector)regressor

17

A SINGLE SCRIPT TO DEFINE THE BUNDLE

X ypred score type

score

cross-validation scheme

data

con

nect

ors

FE CLF

workflow

18

A SINGLE EXECUTABLE TO TEST THE SUBMISSIONS

• Keep your different submissions in a simple file structure

• Communicate them on git

• Execute them also from the notebook

19

You can

1. Use rampwf for your own workflows 2. Use rampwf to organize workflow

building and optimization in an internal data science team

3. Submit it to us if you want to run a data challenge

20

toolkit: github.com/paris-saclay-cds/ramp-workflow

examples: github.com/ramp-kits

blogs: medium.com/@balazskegl

slack: ramp-studio.slack.com

frontend: www.ramp.studio

mail: [email protected]