public class SplitQualityGini extends SplitQualityMeasure
Constructor and Description |
---|
SplitQualityGini() |
Modifier and Type | Method and Description |
---|---|
double |
getWorstValue()
Returns the worst value for this quality measure.
|
void |
initQualityMeasure(double[] classFrequencies,
double allOverRecords)
Some quality measures, like the information gain, calculate a quality of
a previous distribution compared to a new one.
|
boolean |
isBetter(double quality1,
double quality2)
A gini index is better if it is larger than the other one.
|
boolean |
isBetterOrEqual(double quality1,
double quality2)
A GINI index is better if it is larger than the other one.
|
double |
measureQuality(double allOverRecords,
double[] partitionFrequency,
double[][] partitionClassFrequency,
double numUnknownRecords)
Calculates the gini split index.
|
double |
postProcessMeasure(double qualityMeasure,
double allOverRecords,
double[] partitionFrequency,
double numUnknownRecords)
The gini index need not to post process the measure.
|
String |
toString() |
clone
public boolean isBetter(double quality1, double quality2)
isBetter
in class SplitQualityMeasure
quality1
- first quality to comparequality2
- second quality to comparepublic boolean isBetterOrEqual(double quality1, double quality2)
isBetterOrEqual
in class SplitQualityMeasure
quality1
- first quality to comparequality2
- second quality to comparepublic double measureQuality(double allOverRecords, double[] partitionFrequency, double[][] partitionClassFrequency, double numUnknownRecords)
For a dataset T the gini index is: gini(T) = 1 - SUM(pj * pj) - for all relative class frequencies pj (pj = Pj/|T|). Pj is the absolut class frequency and nx the number of records in the data set
The gini for the split is: giniSplit(T) = SUM(nx/N*gini(Tx)) - for all relative partition frequencies nx/N and all partitions Tx
measureQuality
in class SplitQualityMeasure
allOverRecords
- the allover number of records with known values in
the partition to split; corresponds to N in the formulapartitionFrequency
- the frequencies of the different patitions;
corresponds to nx in the formulapartitionClassFrequency
- all class frequencies Pj (second
dimension) for all partitions Tx (first dimension *numUnknownRecords
- the number of records with unknown (missing)
value of the relevant attribute; used to weight the quality
measurepublic double getWorstValue()
getWorstValue
in class SplitQualityMeasure
public void initQualityMeasure(double[] classFrequencies, double allOverRecords)
initQualityMeasure
in class SplitQualityMeasure
classFrequencies
- the class frequenciesallOverRecords
- the overall countpublic String toString()
toString
in class SplitQualityMeasure
public double postProcessMeasure(double qualityMeasure, double allOverRecords, double[] partitionFrequency, double numUnknownRecords)
postProcessMeasure
in class SplitQualityMeasure
qualityMeasure
- the quality measure to post processallOverRecords
- the allover number of known (non-missing) recordspartitionFrequency
- the frequencies of the potential split
partitionsnumUnknownRecords
- the number of unknown (missing) records
KNIME GmbH, Konstanz, Germany
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.