-
Notifications
You must be signed in to change notification settings - Fork 108
PMML Evaluator
lisahua edited this page Aug 1, 2014
·
12 revisions
- Example of using PMML Evaluator
- Discussion of the functionName attribute for PMML model
- The name of Mining Fields and Local Transformation Derived Fields
- Normalize data
Here is an example for using NeuralNetwork Evaluator
public void testEvaluator() {
PMML pmml = PMMLUtil.loadPMML(PMMLFILEPATH);
NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);
List<Map<FieldName, String>> input = CsvUtil.load(EVALUATIONDATASET);
for (Map<FieldName, String> maps : input) {
switch (evaluator.getModel().getFunctionName()) {
case REGRESSION:
Map<FieldName, Double> regressionTerm = (Map<FieldName, Double>) evaluator.evaluate(maps);
for (Double value : regressionTerm.values())
System.out.println(value * 1000);
break;
case CLASSIFICATION:
Map<FieldName, ClassificationMap<String>> classificationTerm = (Map<FieldName, ClassificationMap<String>>) evaluator.evaluate(maps);
for (ClassificationMap<String> cMap : classificationTerm.values())
for (Map.Entry<String, Double> entry : cMap.entrySet())
System.out.println(entry.getValue() * 1000);
}
}
}
The difference of setting the functionName attribute to classification
and regression
regression | classificaton | notes | |
---|---|---|---|
Specify function name | <NeuralNetwork modelName="demoModel" functionName="regression"> |
<NeuralNetwork modelName="demoModel" functionName="classification"> |
|
Output expression | <FieldRef field="diagnosis_transformed"/> |
<NormDiscrete field="diagnosis_transformed" value="M"/> |
Neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete. |
PMML evaluator | Map<FieldName, Double> regressionTerm = (Map<FieldName, Double>) evaluator.evaluate(maps); |
Map<FieldName, ClassificationMap<String>> classificationTerm = (Map<FieldName, ClassificationMap<String>>) evaluator.evaluate(maps); |
-
Example1: regression
<NeuralNetwork modelName="demoModel" functionName="regression">
<NeuralOutputs numberOfOutputs="1"> <NeuralOutput outputNeuron="2,0"> <DerivedField optype="continuous" dataType="double"> <FieldRef field="diagnosis"/> </DerivedField> </NeuralOutput> </NeuralOutputs>
-
Example2: regression
<NeuralNetwork modelName="demoModel" functionName="regression">
<NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="13">
<DerivedField optype="continuous" dataType="double">
<NormContinuous field="amount of claims">
<LinearNorm orig="0" norm="0.1"/>
<LinearNorm orig="1291.68" norm="0.5"/>
<LinearNorm orig="5327.26" norm="0.9"/>
</NormContinuous>
</DerivedField>
</NeuralOutput>
</NeuralOutputs>
-
Example3: classification
<NeuralNetwork modelName="demoModel" functionName="classification">
Here is an example that uses classification function: Iris PMML
<NeuralLayer> <Neuron id="2,0" bias="36.829174221809204"> <Con from="1,0" weight="-15.428606782109018" /> <Con from="1,1" weight="-58.68586577113855" /> <Con from="1,2" weight="-4.533681748641222" /> </Neuron> <Neuron id="2,1" bias="-3.832065207474468"> <Con from="1,0" weight="4.803555297576479" /> <Con from="1,1" weight="4.858790438015236" /> <Con from="1,2" weight="-12.562463287384077" /> </Neuron> </NeuralLayer> <NeuralOutputs numberOfOutputs="2"> <NeuralOutput outputNeuron="2,0"> <DerivedField optype="categorical" dataType="string"> <NormDiscrete field="class" value="Iris-setosa" /> </DerivedField> </NeuralOutput> <NeuralOutput outputNeuron="2,1"> <DerivedField optype="categorical" dataType="string"> <NormDiscrete field="class" value="Iris-versicolor" /> </DerivedField> </NeuralOutput> </NeuralOutputs>
Notes:
- Neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.
- The id of
outputNeuron
decides the score of the neuron output, while the value attribute of theNormDiscrete
decides the item name in theClassificationMap<String>
. - The computed activation of the output neurons is compared with the normalized values of the corresponding target fields;The difference between the neuron's activation and the normalized target field determines the prediction error.
- For scoring the normalization for the target field is used to denormalize the predicted value in the output neuron. Therefore, each instance of Neuron which represent an output neuron, is additionally connected to a normalized field.
Please check scope of field for PMML.
- PMML Evaluator will first check whether the field is an input field, if the field is an input, it will return the value directly.
- If the field is not an input field, it will check the local derived fields in local transformation and returns the value after the transforamtion, that is, the normalized data.
class ExpressionUtil {
static public FieldValue evaluate(FieldName name, EvaluationContext context){
Map.Entry<FieldName, FieldValue> entry = context.getFieldEntry(name); //input fields
if(entry == null){
DerivedField derivedField = context.resolveDerivedField(name); //get local derived fields
if(derivedField == null){
return null;
}
FieldValue value = evaluate(derivedField, context);
// Make the calculated value available for re-use
context.declare(name, value);
return value;
}
return entry.getValue();
}
}
The implementation of generating neuron inputs based on both local transformation and mining schema haven't been finished.
- For each mining field that is
supplementary
inusageType
, ignore it. - If the field is not used by any of the local transformations, create a neuron input using the name of the mining field.
- If the field is used by a local transformation, creates a neuron input with
FieldRef
of transformed field's name, followed by the field order of the mining schema. (Q: What if there are more than one mining fields that use in one local transformation, and what if there are more than one local transformations that use a single mining field?)
- Example of normalizing data in SparkLogisticRegressionToPMMLTest
Sample code in SparkLogisticRegressionToPMMLTest
private void evaluate(SparkTestDataGenerator evalInput) {
for (Map<FieldName, String> map : evalInput.getEvaluatorInput()) {
ModelEvaluationContext context = new ModelEvaluationContext(null, evaluator);
context.declareAll(map);
Vector vector = new DenseVector(evalInput.normalizeData(context));
Assert.assertEquals(getPMMLEvaluatorResult(map),mlModel.predict(vector), DELTA);
}
}
Sample code in SparkTestDataGenerator
public double[] normalizeData(ModelEvaluationContext context) {
Model model = pmml.getModels().get(0);
List<DerivedField> derivedFields = model.getLocalTransformations().getDerivedFields();
List<Double> transformed = new ArrayList<Double>();
for (DerivedField df : derivedFields) {
if (df.getExpression() instanceof NormContinuous) {
NormContinuous norm = (NormContinuous) df.getExpression();
transformed.add(Double.parseDouble(NormalizationUtil.normalize(norm, context.getField(norm.getField())).getValue().toString()));
}
...
}
int len = transformed.size();
double[] result = new double[len];
for (int i = 0; i < len; i++)
result[i] = transformed.get(i);
return result;
}