shifu plugin-trainer and pmml-adapter
DESCRIPTION
Shifu (www.shifu.ml) is a fast and scalable machine learning platform. This presentation briefly describes how to convert the Logistic Regression and Neural Network model in Encog, Mahout, and Spark.TRANSCRIPT
Shifu-Plugin Demo
Lisa Hua7/21/2014
Recap
1. Convert PMML back to ML model2. Integrate to Shifu as Shifu-plugin-*3. Add examples4. Performance test for PMML evaluator
Miscellaneous
1. Compatible issue: Spark depends on Akka 2.2.3, while shifu uses 2.1.1
2. Spark overview3. About showcase
a video that introduces shifua poster that describes my projecta project title and project description
PMML Adapter Demo
Lisa Hua06/23/14
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD
Outline
1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model
2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model
2. PMML Adapter API and how to extend PMML Adapter
Performance Test
protected void initMLModel() {...
mlModel = new MultilayerPerceptron();
mlModel.addLayer(20, false, "Identity");
// numInputFields,isFinalLayer,squashFunction
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(45, false, "Sigmoid");
mlModel.addLayer(1, true, "Sigmoid");
for (MahoutData data : inputDataSet) {
mlModel.trainOnline(data.getInput()); …}
}
protected void adaptToPMML() {...
Matrix[] matrixList = nnModel.getWeightMatrices();...
}
squashFunctions: 1. only supports identity and sigmoid now.2. squashFunctionList is protected without getter function, now we set activationFunction as sigmoid by default.
Mahout NN Model - trainOnline()
//in Adapterfor (int k = 1; k < columnSize; k++) {
neuron.withConnections(new Connection(matrix.get(j, k))); } // bias neuron for each layer, set to bias=1 neuron.withConnections(new Connection(matrix.get(j, 0)));
Bias is the first Neuron in each layer that is not the final layer
protected void evaluatePMML() {
for (int i = 0; i < mahoutDataSet.size(); i++) {
Assert.assertEquals(
getPMMLEvaluatorResult(pmmlEvalResultList.get(i)),
getMahoutResult(mahoutDataSet.get(i)),
DELTA);//DELTA=10-5
}
private double getMahoutResult(MahoutData data) {
return mlModel.getOutput(data.getEvalInput()).get(0);
}
Mahout NN Model - getOutput()
Outline
1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model
2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model
2. PMML Adapter API and how to extend PMML Adapter
Encog LR Model - compute()
protected void initMLModel() {...
lrModel = (BasicNetwork) networkReader.read(new
FileInputStream("EncogLR.lr"));
}
protected void adaptToPMML() {...
double[] weights = lrModel.getWeights();...}
}protected void evaluatePMML() {
for (int i = 0; i < dataSet.size(); i++) {
Assert.assertEquals( getPMMLEvaluatorResult(index++),
getNextEncogLRResult(mlResultIterator), DELTA);
}}
private double getNextEncogLRResult(Iterator<MLDataPair>
mlResultIterator) {
MLData result =
lrModel.compute(mlResultIterator.next().getInput());
return result.getData(0);
}
Spark LR Model: train() and predict()
protected void initMLModel() {...
lrModel = LogisticRegressionWithSGD.train(points.rdd(),
iterations,stepSize);
}
protected void adaptToPMML() {...
List<double> weights = lrModel.weights();
...}
protected void evaluatePMML() {... List<Double> evalList = lrModel.predict(evalRDD).cache().collect();
for (...) {
Assert.assertEqual( getPMMLEvaluatorResult(i),
sparkEvalList.get(i),DELTA);
}
}
Notes: 1. The method lrModel.weights() returns intercept followed by the weight list.
2. Compatible issue:
Spark depends on Akka 2.2.3, while shifu uses 2.1.1. Currently, these is compatible issue if we change Akka version of shifu-core from 2.1.1 to 2.2.3, I suspect the issue lies in Guagua based on the building history, the root cause is still unknown to me.
Mahout LR Model - train() and classifyScalar()
protected void initMLModel() {...
lrModel = new OnlineLogisticRegression(2, 20, new L1());
//numCategory, numFeatures, PriorFunction
for (MahoutDataPair pair :
inputDataSet) {
lrModel.train(pair.getActual(),
pair.getFeatureField());
}
}
protected void adaptToPMML() {... Matrix matrix = lrModel.getBeta(); // coefficients. This is a dense matrix
// that is (numCategories-1) x numFeatures
}
private double
getMahoutResult(MahoutDataPair data) {
return
lrModel.classifyScalar(data.getVector());
//Returns a single scalar
probability in the case where we have two
categories.
}
Summary of Evaluation Dataset
Model ML Framework Input Data Field Input Data Evaluation Data Nodes in each layer
NeuralNetwork
Encog 2 layers 20 450118
20,45,45,1560
Encog 3 layers 25 450 550 25,20,15,20,1
Mahout 2 layers 20 450118
20,45,45,1560
Mahout 3 layers 25 450 550 25,20,15,20,1
LogisticRegression
Encog 20 450118
560
LogisticRegression
Spark 20 450118
560
LogisticRegression
Mahout 20 450118
560
Summary of the Functions
model class nameparent class/interface Training method
retrieve training result
evalution method
Basic Data Structure
Encog
Neural Network
BasicNetork MLClassificationcompute (MLDataSet data)
getWeights(): double[] compute()
MLData: Double[], MLDataSet: Set<Double[]>
Logistic Regression
SparkLogistic Regression
Logistic Regression Model
GeneralLinearModel, ClassificationModel train(RDD data) weights():double[]
predict (RDD <Vector>): RDD<Double>
RDD: Resilient Distributed Dataset
Mahout
Neural Network
Multilayer Perceptron NeuralNetwork
trainOnline (Vector instance)
getWeightMatrices ():Matrix
getOutput (Vector):Vector
VectorMatrix: List<Vector>
Logistic Regression
Online Logistic Regression
AbstractOnline LogisticRegression
train(Vector actual, Vector instance) getBeta(): Matrix
classifyScalar (Vector instance) :double
Outline
1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model
2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model
2. PMML Adapter API and how to extend PMML Adapter
3. PMML Adapter API
1. For new ML model conversiona. implement a subclass of PMMLModelBuilder<TargetPMMLModel, SourceMLModel>, implement adaptMLModelToPMML()
Next Step
● Support: supported by PMML Adapter● None: The ML framework doesn’t support this ML
model currently ● TBD: To be determined
ML Framework Neural Network Logistic Regression SVM Decision Tree
Encog Support Support TBD None
Spark None Support TBD TBD
Mahout Support Support TBD TBD
H2o TBD None TBD TBD
1. PMML skeleton - Neural Network<PMML>
<Header></Header><DataDictionary></DataDictionary> (specify the format of the input csv)<NeuralNetwork functionName=”classification”> (models)
<MiningSchema></MiningSchema> (how to use the input data)<LocalTransformation></LocalTransformation> (specify derived field)
<NeuralInput></NeuralInput> (Input layer, which field should be used)
<NeuralLayers> (Layers,not include input layer and output layer)<NeuralLayer
activationFunction=”logistic”><Neuron id=”X,Y” bias=”0.0”>
<Con from=”X-1,Y” weight=””> </Neuron>
</NeuralLayer></NeuralLayers> <NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="3,0"></NeuralOutput ></NeuralOutputs></NeuralNetwork></PMML>
2.1 PMML Neural Network - Mahout
2,3,1{ 0 => {0:-0.2861259717601905,1:-0.4079344783742465,2:-0.43218273192749174} 1 => {0:0.223912887382075,1:-0.08865866120943716,2:0.4095464158191267} 2 => {0:0.14754755237008804,1:0.2638192545136143,2:0.06633581725392071}}{ 0 => {0:0.04388751672411058,1:-0.35597268769777723,2:0.21149680575173224,3:0.34402628331423807}}0.5635827615510126,0.5482023969601073,0.5609684690326279,0.5751568027254008,
Propagation Weight train evaluate
Encog backpropagation double[] MLTrain/Propagation
Mahout feed-forward Matrix network.trainOnline (vector)
network.getOutput(vector)
3. PMML Evaluationpublic Map<String, Double> evaluateRaw(EvaluationContext context){
NeuralNetwork neuralNetwork = getModel();Map<String, Double> result = Maps.newLinkedHashMap();NeuralInputs neuralInputs = neuralNetwork.getNeuralInputs();for(NeuralInput neuralInput: neuralInputs){
DerivedField derivedField = neuralInput.getDerivedField();FieldValue value = ExpressionUtil.evaluate(derivedField, context);...result.put(neuralInput.getId(), (value.asNumber()).doubleValue());
}List<NeuralLayer> neuralLayers = neuralNetwork.getNeuralLayers();for(NeuralLayer neuralLayer : neuralLayers){
List<Neuron> neurons = neuralLayer.getNeurons();for(Neuron neuron : neurons){
double z = neuron.getBias();//the bias for each Neuron, should be set to 0
List<Connection> connections = neuron.getConnections();for(Connection connection : connections){
double input = result.get(connection.getFrom());z += input * connection.getWeight();
}double output = activation(z, neuralLayer);result.put(neuron.getId(), output);
}normalizeNeuronOutputs(neuralLayer, result);
}return result;
}private double activation(double z, NeuralLayer neuralLayer){...
switch(activationFunction){case LOGISTIC: return 1.0 / (1.0 + Math.exp(-z)); //Sigmoidcase IDENTITY: return z; ...//Linear
}}
How to get score from PMML evaluator - EvaluatorTest
PMML pmml = loadPMML(getClass()); //InputStream is = getResourceAsStream("/pmml/" +getSimpleName() + ".pmml");//return IOUtil.unmarshal(is);
NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);InputStream is = getClass().getResourceAsStream("/pmml/NormalizedData.csv");List<Map<FieldName, String>> input = CsvUtil.load(is);for (Map<FieldName, String> maps : input) {
Map<FieldName, NeuronClassificationMap> evaluateList = (Map<FieldName, NeuronClassificationMap>)
evaluator.evaluate(maps);for
(NeuronClassificationMap cMap : evaluateList.values())
for (Map.Entry<?, Double> entry : cMap.entrySet())
System.out.println(index++ +":"+entry.getKey() + ":" + entry.getValue() * 1000);
List<FieldName> activeFields = evaluator.getActiveFields();
}