com.rapidminer.operator.preprocessing.filter
Class TFIDFFilter

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.AbstractExampleSetProcessing
              extended by com.rapidminer.operator.preprocessing.AbstractDataProcessing
                  extended by com.rapidminer.operator.preprocessing.filter.TFIDFFilter
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class TFIDFFilter
extends AbstractDataProcessing

This operator generates TF-IDF values from the input data. The input example set must contain either simple counts, which will be normalized during calculation of the term frequency TF, or it already contains the calculated term frequency values (in this case no normalization will be done).

Author:
Ingo Mierswa

Field Summary
static java.lang.String PARAMETER_CALCULATE_TERM_FREQUENCIES
          The parameter name for "Indicates if term frequency values should be generated (must be done if input data is given as simple occurence counts).
 
Constructor Summary
TFIDFFilter(OperatorDescription description)
           
 
Method Summary
 ExampleSet apply(ExampleSet exampleSet)
          Delegate for the apply method.
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 ResourceConsumptionEstimator getResourceConsumptionEstimator()
          Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input.
protected  MetaData modifyMetaData(ExampleSetMetaData metaData)
          Subclasses might override this method to define the meta data transformation performed by this operator.
 boolean writesIntoExistingData()
          This method indicates whether the operator will perform a write operation on a cell in an existing column of the example set's ExampleTable.
 
Methods inherited from class com.rapidminer.operator.AbstractExampleSetProcessing
doWork, getExampleSetInputPort, getExampleSetOutputPort, getInputPort, getRequiredMetaData, shouldAutoConnect
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, registerOperator, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_CALCULATE_TERM_FREQUENCIES

public static final java.lang.String PARAMETER_CALCULATE_TERM_FREQUENCIES
The parameter name for "Indicates if term frequency values should be generated (must be done if input data is given as simple occurence counts)."

See Also:
Constant Field Values
Constructor Detail

TFIDFFilter

public TFIDFFilter(OperatorDescription description)
Method Detail

modifyMetaData

protected MetaData modifyMetaData(ExampleSetMetaData metaData)
                           throws UndefinedParameterError
Description copied from class: AbstractExampleSetProcessing
Subclasses might override this method to define the meta data transformation performed by this operator.

Overrides:
modifyMetaData in class AbstractExampleSetProcessing
Throws:
UndefinedParameterError

apply

public ExampleSet apply(ExampleSet exampleSet)
                 throws OperatorException
Description copied from class: AbstractExampleSetProcessing
Delegate for the apply method. The given ExampleSet is already a clone of the input example set so that changing this examples set does not affect the original one. Subclasses should avoid cloning again unnecessarily.

Specified by:
apply in class AbstractExampleSetProcessing
Throws:
OperatorException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator

writesIntoExistingData

public boolean writesIntoExistingData()
Description copied from class: AbstractExampleSetProcessing
This method indicates whether the operator will perform a write operation on a cell in an existing column of the example set's ExampleTable. If yes, the original example will be completely copied in memory if the original port is used. Note: Subclasses must implement this method. The safe implementation would be to return true, however, for backwards compatibility, the default implementation returns false.

Overrides:
writesIntoExistingData in class AbstractExampleSetProcessing

getResourceConsumptionEstimator

public ResourceConsumptionEstimator getResourceConsumptionEstimator()
Description copied from class: Operator
Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input. The default implementation returns null.

Specified by:
getResourceConsumptionEstimator in interface ResourceConsumer
Overrides:
getResourceConsumptionEstimator in class Operator


Copyright © 2001-2009 by Rapid-I