shifu plugin-trainer and pmml-adapter

21
Shifu-Plugin Demo Lisa Hua 7/21/2014

Upload: lisa-hua

Post on 19-Jun-2015

249 views

Category:

Technology


1 download

DESCRIPTION

Shifu (www.shifu.ml) is a fast and scalable machine learning platform. This presentation briefly describes how to convert the Logistic Regression and Neural Network model in Encog, Mahout, and Spark.

TRANSCRIPT

Page 1: Shifu plugin-trainer and pmml-adapter

Shifu-Plugin Demo

Lisa Hua7/21/2014

Page 2: Shifu plugin-trainer and pmml-adapter

Recap

1. Convert PMML back to ML model2. Integrate to Shifu as Shifu-plugin-*3. Add examples4. Performance test for PMML evaluator

Page 3: Shifu plugin-trainer and pmml-adapter

Miscellaneous

1. Compatible issue: Spark depends on Akka 2.2.3, while shifu uses 2.1.1

2. Spark overview3. About showcase

a video that introduces shifua poster that describes my projecta project title and project description

Page 4: Shifu plugin-trainer and pmml-adapter

PMML Adapter Demo

Lisa Hua06/23/14

ML Framework Neural Network Logistic Regression SVM Decision Tree

Encog Support Support TBD None

Spark None Support TBD TBD

Mahout Support Support TBD TBD

H2o TBD None TBD TBD

Page 5: Shifu plugin-trainer and pmml-adapter

Outline

1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model

2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model

2. PMML Adapter API and how to extend PMML Adapter

Page 6: Shifu plugin-trainer and pmml-adapter

Performance Test

Page 7: Shifu plugin-trainer and pmml-adapter

protected void initMLModel() {...

mlModel = new MultilayerPerceptron();

mlModel.addLayer(20, false, "Identity");

// numInputFields,isFinalLayer,squashFunction

mlModel.addLayer(45, false, "Sigmoid");

mlModel.addLayer(45, false, "Sigmoid");

mlModel.addLayer(1, true, "Sigmoid");

for (MahoutData data : inputDataSet) {

mlModel.trainOnline(data.getInput()); …}

}

protected void adaptToPMML() {...

Matrix[] matrixList = nnModel.getWeightMatrices();...

}

squashFunctions: 1. only supports identity and sigmoid now.2. squashFunctionList is protected without getter function, now we set activationFunction as sigmoid by default.

Mahout NN Model - trainOnline()

//in Adapterfor (int k = 1; k < columnSize; k++) {

neuron.withConnections(new Connection(matrix.get(j, k))); } // bias neuron for each layer, set to bias=1 neuron.withConnections(new Connection(matrix.get(j, 0)));

Bias is the first Neuron in each layer that is not the final layer

Page 8: Shifu plugin-trainer and pmml-adapter

protected void evaluatePMML() {

for (int i = 0; i < mahoutDataSet.size(); i++) {

Assert.assertEquals(

getPMMLEvaluatorResult(pmmlEvalResultList.get(i)),

getMahoutResult(mahoutDataSet.get(i)),

DELTA);//DELTA=10-5

}

private double getMahoutResult(MahoutData data) {

return mlModel.getOutput(data.getEvalInput()).get(0);

}

Mahout NN Model - getOutput()

Page 9: Shifu plugin-trainer and pmml-adapter

Outline

1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model

2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model

2. PMML Adapter API and how to extend PMML Adapter

Page 10: Shifu plugin-trainer and pmml-adapter

Encog LR Model - compute()

protected void initMLModel() {...

lrModel = (BasicNetwork) networkReader.read(new

FileInputStream("EncogLR.lr"));

}

protected void adaptToPMML() {...

double[] weights = lrModel.getWeights();...}

}protected void evaluatePMML() {

for (int i = 0; i < dataSet.size(); i++) {

Assert.assertEquals( getPMMLEvaluatorResult(index++),

getNextEncogLRResult(mlResultIterator), DELTA);

}}

private double getNextEncogLRResult(Iterator<MLDataPair>

mlResultIterator) {

MLData result =

lrModel.compute(mlResultIterator.next().getInput());

return result.getData(0);

}

Page 11: Shifu plugin-trainer and pmml-adapter

Spark LR Model: train() and predict()

protected void initMLModel() {...

lrModel = LogisticRegressionWithSGD.train(points.rdd(),

iterations,stepSize);

}

protected void adaptToPMML() {...

List<double> weights = lrModel.weights();

...}

protected void evaluatePMML() {... List<Double> evalList = lrModel.predict(evalRDD).cache().collect();

for (...) {

Assert.assertEqual( getPMMLEvaluatorResult(i),

sparkEvalList.get(i),DELTA);

}

}

Notes: 1. The method lrModel.weights() returns intercept followed by the weight list.

2. Compatible issue:

Spark depends on Akka 2.2.3, while shifu uses 2.1.1. Currently, these is compatible issue if we change Akka version of shifu-core from 2.1.1 to 2.2.3, I suspect the issue lies in Guagua based on the building history, the root cause is still unknown to me.

Page 12: Shifu plugin-trainer and pmml-adapter

Mahout LR Model - train() and classifyScalar()

protected void initMLModel() {...

lrModel = new OnlineLogisticRegression(2, 20, new L1());

//numCategory, numFeatures, PriorFunction

for (MahoutDataPair pair :

inputDataSet) {

lrModel.train(pair.getActual(),

pair.getFeatureField());

}

}

protected void adaptToPMML() {... Matrix matrix = lrModel.getBeta(); // coefficients. This is a dense matrix

// that is (numCategories-1) x numFeatures

}

private double

getMahoutResult(MahoutDataPair data) {

return

lrModel.classifyScalar(data.getVector());

//Returns a single scalar

probability in the case where we have two

categories.

}

Page 13: Shifu plugin-trainer and pmml-adapter

Summary of Evaluation Dataset

Model ML Framework Input Data Field Input Data Evaluation Data Nodes in each layer

NeuralNetwork

Encog 2 layers 20 450118

20,45,45,1560

Encog 3 layers 25 450 550 25,20,15,20,1

Mahout 2 layers 20 450118

20,45,45,1560

Mahout 3 layers 25 450 550 25,20,15,20,1

LogisticRegression

Encog 20 450118

560

LogisticRegression

Spark 20 450118

560

LogisticRegression

Mahout 20 450118

560

Page 14: Shifu plugin-trainer and pmml-adapter

Summary of the Functions

model class nameparent class/interface Training method

retrieve training result

evalution method

Basic Data Structure

Encog

Neural Network

BasicNetork MLClassificationcompute (MLDataSet data)

getWeights(): double[] compute()

MLData: Double[], MLDataSet: Set<Double[]>

Logistic Regression

SparkLogistic Regression

Logistic Regression Model

GeneralLinearModel, ClassificationModel train(RDD data) weights():double[]

predict (RDD <Vector>): RDD<Double>

RDD: Resilient Distributed Dataset

Mahout

Neural Network

Multilayer Perceptron NeuralNetwork

trainOnline (Vector instance)

getWeightMatrices ():Matrix

getOutput (Vector):Vector

VectorMatrix: List<Vector>

Logistic Regression

Online Logistic Regression

AbstractOnline LogisticRegression

train(Vector actual, Vector instance) getBeta(): Matrix

classifyScalar (Vector instance) :double

Page 15: Shifu plugin-trainer and pmml-adapter

Outline

1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model

2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model

2. PMML Adapter API and how to extend PMML Adapter

Page 16: Shifu plugin-trainer and pmml-adapter

3. PMML Adapter API

1. For new ML model conversiona. implement a subclass of PMMLModelBuilder<TargetPMMLModel, SourceMLModel>, implement adaptMLModelToPMML()

Page 17: Shifu plugin-trainer and pmml-adapter

Next Step

● Support: supported by PMML Adapter● None: The ML framework doesn’t support this ML

model currently ● TBD: To be determined

ML Framework Neural Network Logistic Regression SVM Decision Tree

Encog Support Support TBD None

Spark None Support TBD TBD

Mahout Support Support TBD TBD

H2o TBD None TBD TBD

Page 18: Shifu plugin-trainer and pmml-adapter

1. PMML skeleton - Neural Network<PMML>

<Header></Header><DataDictionary></DataDictionary> (specify the format of the input csv)<NeuralNetwork functionName=”classification”> (models)

<MiningSchema></MiningSchema> (how to use the input data)<LocalTransformation></LocalTransformation> (specify derived field)

<NeuralInput></NeuralInput> (Input layer, which field should be used)

<NeuralLayers> (Layers,not include input layer and output layer)<NeuralLayer

activationFunction=”logistic”><Neuron id=”X,Y” bias=”0.0”>

<Con from=”X-1,Y” weight=””> </Neuron>

</NeuralLayer></NeuralLayers> <NeuralOutputs numberOfOutputs="1">

<NeuralOutput outputNeuron="3,0"></NeuralOutput ></NeuralOutputs></NeuralNetwork></PMML>

Page 19: Shifu plugin-trainer and pmml-adapter

2.1 PMML Neural Network - Mahout

2,3,1{ 0 => {0:-0.2861259717601905,1:-0.4079344783742465,2:-0.43218273192749174} 1 => {0:0.223912887382075,1:-0.08865866120943716,2:0.4095464158191267} 2 => {0:0.14754755237008804,1:0.2638192545136143,2:0.06633581725392071}}{ 0 => {0:0.04388751672411058,1:-0.35597268769777723,2:0.21149680575173224,3:0.34402628331423807}}0.5635827615510126,0.5482023969601073,0.5609684690326279,0.5751568027254008,

Propagation Weight train evaluate

Encog backpropagation double[] MLTrain/Propagation

Mahout feed-forward Matrix network.trainOnline (vector)

network.getOutput(vector)

Page 20: Shifu plugin-trainer and pmml-adapter

3. PMML Evaluationpublic Map<String, Double> evaluateRaw(EvaluationContext context){

NeuralNetwork neuralNetwork = getModel();Map<String, Double> result = Maps.newLinkedHashMap();NeuralInputs neuralInputs = neuralNetwork.getNeuralInputs();for(NeuralInput neuralInput: neuralInputs){

DerivedField derivedField = neuralInput.getDerivedField();FieldValue value = ExpressionUtil.evaluate(derivedField, context);...result.put(neuralInput.getId(), (value.asNumber()).doubleValue());

}List<NeuralLayer> neuralLayers = neuralNetwork.getNeuralLayers();for(NeuralLayer neuralLayer : neuralLayers){

List<Neuron> neurons = neuralLayer.getNeurons();for(Neuron neuron : neurons){

double z = neuron.getBias();//the bias for each Neuron, should be set to 0

List<Connection> connections = neuron.getConnections();for(Connection connection : connections){

double input = result.get(connection.getFrom());z += input * connection.getWeight();

}double output = activation(z, neuralLayer);result.put(neuron.getId(), output);

}normalizeNeuronOutputs(neuralLayer, result);

}return result;

}private double activation(double z, NeuralLayer neuralLayer){...

switch(activationFunction){case LOGISTIC: return 1.0 / (1.0 + Math.exp(-z)); //Sigmoidcase IDENTITY: return z; ...//Linear

}}

Page 21: Shifu plugin-trainer and pmml-adapter

How to get score from PMML evaluator - EvaluatorTest

PMML pmml = loadPMML(getClass()); //InputStream is = getResourceAsStream("/pmml/" +getSimpleName() + ".pmml");//return IOUtil.unmarshal(is);

NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);InputStream is = getClass().getResourceAsStream("/pmml/NormalizedData.csv");List<Map<FieldName, String>> input = CsvUtil.load(is);for (Map<FieldName, String> maps : input) {

Map<FieldName, NeuronClassificationMap> evaluateList = (Map<FieldName, NeuronClassificationMap>)

evaluator.evaluate(maps);for

(NeuronClassificationMap cMap : evaluateList.values())

for (Map.Entry<?, Double> entry : cMap.entrySet())

System.out.println(index++ +":"+entry.getKey() + ":" + entry.getValue() * 1000);

List<FieldName> activeFields = evaluator.getActiveFields();

}