vssml16 l7. rest api, bindings, and basic workflows

44
Automating Machine Learning API, bindings, BigMLer and Basic Workflows #VSSML16 September 2016 #VSSML16 Automating Machine Learning September 2016 1 / 43

Upload: bigml-inc

Post on 07-Feb-2017

137 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: VSSML16 L7. REST API, Bindings, and Basic Workflows

Automating Machine LearningAPI, bindings, BigMLer and Basic Workflows

#VSSML16

September 2016

#VSSML16 Automating Machine Learning September 2016 1 / 43

Page 2: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 2 / 43

Page 3: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 3 / 43

Page 4: VSSML16 L7. REST API, Bindings, and Basic Workflows

Machine Learning as a System Service

The goalMachine Learning as a systemlevel service

The means

• APIs: ML building blocks

• Abstraction layer over featureengineering

• Abstraction layer overalgorithms

• Automation

#VSSML16 Automating Machine Learning September 2016 4 / 43

Page 5: VSSML16 L7. REST API, Bindings, and Basic Workflows

Machine Learning workflows

#VSSML16 Automating Machine Learning September 2016 5 / 43

Page 6: VSSML16 L7. REST API, Bindings, and Basic Workflows

Machine Learning workflows, for real

#VSSML16 Automating Machine Learning September 2016 6 / 43

Page 7: VSSML16 L7. REST API, Bindings, and Basic Workflows

Higher-level Machine Learning

#VSSML16 Automating Machine Learning September 2016 7 / 43

Page 8: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 8 / 43

Page 9: VSSML16 L7. REST API, Bindings, and Basic Workflows

Example workflow: Batch Centroid

Objective: Label each row in a Dataset with its associated centroid.

We need to...

• Create Dataset

• Create Cluster

• Create BatchCentroid from Clusterand Dataset

• Save BatchCentroid as new Dataset

#VSSML16 Automating Machine Learning September 2016 9 / 43

Page 10: VSSML16 L7. REST API, Bindings, and Basic Workflows

Example workflow: building blocks

curl -X POST "https://bigml.io?$AUTH/dataset" \

-D '{"source": "source/56fbbfea200d5a3403000db7"}'

curl -X POST "https://bigml.io?$AUTH/cluster" \

-D '{"source": "dataset/43ffe231a34fff333000b65"}'

curl -X POST "https://bigml.io?$AUTH/batchcentroid" \

-D '{"dataset": "dataset/43ffe231a34fff333000b65",

"cluster": "cluster/33e2e231a34fff333000b65"}'

curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"

#VSSML16 Automating Machine Learning September 2016 10 / 43

Page 11: VSSML16 L7. REST API, Bindings, and Basic Workflows

Example workflow: Web UI

#VSSML16 Automating Machine Learning September 2016 11 / 43

Page 12: VSSML16 L7. REST API, Bindings, and Basic Workflows

Example workflow: Python bindingsfrom bigml.api import BigML

api = BigML()

source = 'source/5643d345f43a234ff2310a3e'

# create dataset and cluster, waiting for both

dataset = api.create_dataset(source)

api.ok(dataset)

cluster = api.create_cluster(dataset)

api.ok(cluster)

# create new dataset with centroid

new_dataset = api.create_batch_centroid(cluster, dataset,

{'output_dataset': True,

'all_fields': True})

# wait again, via polling, until the job is finished

api.ok(new_dataset)

#VSSML16 Automating Machine Learning September 2016 12 / 43

Page 13: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 13 / 43

Page 14: VSSML16 L7. REST API, Bindings, and Basic Workflows

Higher-level Machine Learning

#VSSML16 Automating Machine Learning September 2016 14 / 43

Page 15: VSSML16 L7. REST API, Bindings, and Basic Workflows

Simple workflow in a one-liner

# 1-clikc cluster

bigmler cluster \

--output-dir output/job

--train data/iris.csv \

--test-datasets output/job/dataset \

--remote \

--to-dataset

# the created dataset id:

cat output/job/batch_centroid_dataset

#VSSML16 Automating Machine Learning September 2016 15 / 43

Page 16: VSSML16 L7. REST API, Bindings, and Basic Workflows

Simple automation: “1-click” tasks

# "1-click" ensemble

bigmler --train data/iris.csv \

--number-of-models 500 \

--sample-rate 0.85 \

--output-dir output/iris-ensemble \

--project "vssml tutorial"

# "1-click" dataset with parameterized fields

bigmler --train data/diabetes.csv \

--no-model \

--name "4-featured diabetes" \

--dataset-fields \

"plasma glucose,insulin,diabetes pedigree,diabetes" \

--output-dir output/diabetes \

--project vssml_tutorial

#VSSML16 Automating Machine Learning September 2016 16 / 43

Page 17: VSSML16 L7. REST API, Bindings, and Basic Workflows

Rich, parameterized workflows: cross-validation

bigmler analyze --cross-validation \ # parameterized input

--dataset $(cat output/diabetes/dataset) \

--k-folds 3 \ # number of folds during validation

--output-dir output/diabetes-validation

#VSSML16 Automating Machine Learning September 2016 17 / 43

Page 18: VSSML16 L7. REST API, Bindings, and Basic Workflows

Rich, parameterized workflows: feature selection

bigmler analyze --features \ # parameterized input

--dataset $(cat output/diabetes/dataset) \

--k-folds 2 \ # number of folds during validation

--staleness 2 \ # stop criterium

--optimize precision \ # optimization metric

--penalty 1 \ # algorithm parameter

--output-dir output/diabetes-features-selection

#VSSML16 Automating Machine Learning September 2016 18 / 43

Page 19: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 19 / 43

Page 20: VSSML16 L7. REST API, Bindings, and Basic Workflows

Client-side Machine Learning Automation

Problems of client-side solutionsComplexity Lots of details outside the problem domain

Reuse No inter-language compatibilityScalability Client-side workflows hard to optimize

Extensibility Bigmler hides complexity at the cost of flexibility

Not enough abstraction

#VSSML16 Automating Machine Learning September 2016 20 / 43

Page 21: VSSML16 L7. REST API, Bindings, and Basic Workflows

Higher-level Machine Learning

#VSSML16 Automating Machine Learning September 2016 21 / 43

Page 22: VSSML16 L7. REST API, Bindings, and Basic Workflows

Server-side Machine Learning

#VSSML16 Automating Machine Learning September 2016 22 / 43

Page 23: VSSML16 L7. REST API, Bindings, and Basic Workflows

WhizzML in a Nutshell

• Domain-specific language for ML workflow automationI High-level problem and solution specification

• Framework for scalable, remote execution of ML workflowsI Sophisticated server-side optimizationI Out-of-the-box scalabilityI Client-server brittleness removedI Infrastructure for creating and sharing ML scripts and libraries

#VSSML16 Automating Machine Learning September 2016 23 / 43

Page 24: VSSML16 L7. REST API, Bindings, and Basic Workflows

WhizzML REST Resources

Library Reusable building-block: a collection ofWhizzML definitions that can be imported byother libraries or scripts.

Script Executable code that describes an actualworkflow.

• Imports List of libraries with code used bythe script.

• Inputs List of input values thatparameterize the workflow.

• Outputs List of values computed by thescript and returned to the user.

Execution Given a script and a complete set of inputs,the workflow can be executed and its outputsgenerated.

#VSSML16 Automating Machine Learning September 2016 24 / 43

Page 25: VSSML16 L7. REST API, Bindings, and Basic Workflows

Different ways to create WhizzML Scripts/Libraries

Github

Script editor

Gallery

Other scripts

Scriptify

−→

#VSSML16 Automating Machine Learning September 2016 25 / 43

Page 26: VSSML16 L7. REST API, Bindings, and Basic Workflows

Basic workflow in WhizzML

(let (dataset (create-dataset source)

cluster (create-cluster dataset))

(create-batchcentroid dataset

cluster

{"output_dataset" true

"all_fields" true}))

#VSSML16 Automating Machine Learning September 2016 26 / 43

Page 27: VSSML16 L7. REST API, Bindings, and Basic Workflows

Basic workflow in WhizzML: Usable by any binding

from bigml.api import BigML

api = BigML()

# choose workflow

script = 'script/567b4b5be3f2a123a690ff56'

# define parameters

inputs = {'source': 'source/5643d345f43a234ff2310a3e'}

# execute

api.ok(api.create_execution(script, inputs))

#VSSML16 Automating Machine Learning September 2016 27 / 43

Page 28: VSSML16 L7. REST API, Bindings, and Basic Workflows

Basic workflow in WhizzML: Trivial parallelization

;; Workflow for 1 resource

(let (dataset (create-dataset source)

cluster (create-cluster dataset))

(create-batchcentroid dataset

cluster

{"output_dataset" true

"all_fields" true}))

#VSSML16 Automating Machine Learning September 2016 28 / 43

Page 29: VSSML16 L7. REST API, Bindings, and Basic Workflows

Basic workflow in WhizzML: Trivial parallelization

;; Workflow for any number of resources

(let (datasets (map create-dataset sources)

clusters (map create-cluster datasets)

params {"output_dataset" true "all_fields" true})

(map (lambda (d c) (create-batchcentroid d c params))

datasets

clusters))

#VSSML16 Automating Machine Learning September 2016 29 / 43

Page 30: VSSML16 L7. REST API, Bindings, and Basic Workflows

Basic workflows in WhizzML: automatic generation

#VSSML16 Automating Machine Learning September 2016 30 / 43

Page 31: VSSML16 L7. REST API, Bindings, and Basic Workflows

Standard functions

• Numeric and relational operators (+, *, <, =, ...)

• Mathematical functions (cos, sinh, floor ...)

• Strings and regular expressions (str, matches?, replace, ...)

• Flatline generation

• Collections: list traversal, sorting, map manipulation

• BigML resources manipulationCreation create-source, create-and-wait-dataset, etc.

Retrieval fetch, list-anomalies, etc.

Update update

Deletion delete

• Machine Learning Algorithms (SMACDown, Boosting, etc.)

#VSSML16 Automating Machine Learning September 2016 31 / 43

Page 32: VSSML16 L7. REST API, Bindings, and Basic Workflows

Outline

1 Machine Learning workflows

2 Client-side workflows: REST API and bindings

3 Client-side workflows: Bigmler

4 Server-side workflows: WhizzML

5 Example Workflow Walk-throughs

#VSSML16 Automating Machine Learning September 2016 32 / 43

Page 33: VSSML16 L7. REST API, Bindings, and Basic Workflows

Model or Ensemble?

• Split a dataset in test and training parts

• Create a model and an ensemble with the training dataset

• Evaluate both with the test dataset

• Choose the one with better evaluation (f-measure)

https://github.com/whizzml/examples/tree/master/model-or-ensemble

#VSSML16 Automating Machine Learning September 2016 33 / 43

Page 34: VSSML16 L7. REST API, Bindings, and Basic Workflows

Model or Ensemble?

;; Functions for creating the two dataset parts

;; Sample a dataset taking a fraction of its rows (rate) and

;; keeping either that fraction (out-of-bag? false) or its

;; complement (out-of-bag? true)

(define (sample-dataset origin-id rate out-of-bag?)

(create-dataset {"origin_dataset" origin-id

"sample_rate" rate

"out_of_bag" out-of-bag?

"seed" "example-seed-0001"})))

;; Create in parallel two halves of a dataset using

;; the sample function twice. Return a list of the two

;; new dataset ids.

(define (split-dataset origin-id rate)

(list (sample-dataset origin-id rate false)

(sample-dataset origin-id rate true)))

#VSSML16 Automating Machine Learning September 2016 34 / 43

Page 35: VSSML16 L7. REST API, Bindings, and Basic Workflows

Model or Ensemble?

;; Functions to create an ensemble and extract the f-measure from

;; evaluation, given its id.

(define (make-ensemble ds-id size)

(create-ensemble ds-id {"number_of_models" size}))

(define (f-measure ev-id)

(let (ev-id (wait ev-id) ;; because fetch doesn't wait

evaluation (fetch ev-id))

(evaluation ["result" "model" "average_f_measure"]))

#VSSML16 Automating Machine Learning September 2016 35 / 43

Page 36: VSSML16 L7. REST API, Bindings, and Basic Workflows

Model or Ensemble?

;; Function encapsulating the full workflow

(define (model-or-ensemble src-id)

(let (ds-id (create-dataset {"source" src-id})

[train-id test-id] (split-dataset ds-id 0.8)

m-id (create-model train-id)

e-id (make-ensemble train-id 15)

m-f (f-measure (create-evaluation m-id test-id))

e-f (f-measure (create-evaluation e-id test-id)))

(log-info "model f " m-f " / ensemble f " e-f)

(if (> m-f e-f) m-id e-id)))

;; Compute the result of the script execution

;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]

;; - Outputs: [{"name": "result", "type": "resource-id"}]

(define result (model-or-ensemble input-source-id))

#VSSML16 Automating Machine Learning September 2016 36 / 43

Page 37: VSSML16 L7. REST API, Bindings, and Basic Workflows

Transforming item counts to features

basket milk eggs flour salt chocolate caviar

milk,eggs Y Y N N N N

milk,flour Y N Y N N N

milk,flour,eggs Y Y Y N N N

chocolate N N N N Y N

#VSSML16 Automating Machine Learning September 2016 37 / 43

Page 38: VSSML16 L7. REST API, Bindings, and Basic Workflows

Item counts to features with Flatline

(if (contains-items? "basket" "milk") "Y" "N")

(if (contains-items? "basket" "eggs") "Y" "N")

(if (contains-items? "basket" "flour") "Y" "N")

(if (contains-items? "basket" "salt") "Y" "N")

(if (contains-items? "basket" "chocolate") "Y" "N")

(if (contains-items? "basket" "caviar") "Y" "N")

Parameterized code generationField nameItem valuesY/N category names

#VSSML16 Automating Machine Learning September 2016 38 / 43

Page 39: VSSML16 L7. REST API, Bindings, and Basic Workflows

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#VSSML16 Automating Machine Learning September 2016 39 / 43

Page 40: VSSML16 L7. REST API, Bindings, and Basic Workflows

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#VSSML16 Automating Machine Learning September 2016 39 / 43

Page 41: VSSML16 L7. REST API, Bindings, and Basic Workflows

Flatline code generation with WhizzML

"(if (contains-items? \"basket\" \"milk\") \"Y\" \"N\")"

(let (field "basket"

item "milk"

yes "Y"

no "N")

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

#VSSML16 Automating Machine Learning September 2016 39 / 43

Page 42: VSSML16 L7. REST API, Bindings, and Basic Workflows

Flatline code generation with WhizzML

(define (field-flatline field item yes no)

(flatline "(if (contains-items? {{field}} {{item}})"

"{{yes}}"

"{{no}})"))

(define (item-fields field items yes no)

(for (item items)

{"field" (field-flatline field item yes no)}))

(define (dataset-item-fields ds-id field)

(let (ds (fetch ds-id)

item-dist (ds ["fields" field "summary" "items"])

items (map head item-dist))

(item-fields field items "Y" "N")))

#VSSML16 Automating Machine Learning September 2016 40 / 43

Page 43: VSSML16 L7. REST API, Bindings, and Basic Workflows

Flatline code generation with WhizzML

(define output-dataset

(let (fs {"new_fields" (dataset-item-fields input-dataset

field)})

(create-dataset input-dataset fs)))

{"inputs": [{"name": "input-dataset",

"type": "dataset-id",

"description": "The input dataset"},

{"name": "field",

"type": "string",

"description": "Id of the items field"}],

"outputs": [{"name": "output-dataset",

"type": "dataset-id",

"description": "The id of the generated dataset"}]}

#VSSML16 Automating Machine Learning September 2016 41 / 43

Page 44: VSSML16 L7. REST API, Bindings, and Basic Workflows

More information

Resources

• Home: https://bigml.com/whizzml

• Documentation: https://bigml.com/whizzml#documentation

• Examples: https://github.com/whizzml/examples

#VSSML16 Automating Machine Learning September 2016 42 / 43