com.rapidminer.operator.io
Class ArffExampleSource

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
              extended by com.rapidminer.operator.io.AbstractExampleSource
                  extended by com.rapidminer.operator.io.AbstractDataReader
                      extended by com.rapidminer.operator.io.ArffExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class ArffExampleSource
extends AbstractDataReader

This operator can read ARFF files known from the machine learning library Weka. An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software.

ARFF files have two distinct sections. The first section is the Header information, which is followed the Data information. The Header of the ARFF file contains the name of the relation (@RELATION, ignored by RapidMiner) and a list of the attributes, each of which is defined by a starting @ATTRIBUTE followed by its name and its type.

Attribute declarations take the form of an orderd sequence of @ATTRIBUTE statements. Each attribute in the data set has its own @ATTRIBUTE statement which uniquely defines the name of that attribute and it's data type. The order the attributes are declared indicates the column position in the data section of the file. For example, if an attribute is the third one declared all that attributes values will be found in the third comma delimited column.

The possible attribute types are:

Valid examples for attribute definitions are
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

The ARFF Data section of the file contains the data declaration line @DATA followed by the actual example data lines. Each example is represented on a single line, with carriage returns denoting the end of the example. Attribute values for each example are delimited by commas. They must appear in the order that they were declared in the header section (i.e. the data corresponding to the n-th @ATTRIBUTE declaration is always the n-th field of the example line). Missing values are represented by a single question mark, as in:
4.4,?,1.5,?,Iris-setosa

A percent sign (%) introduces a comment and will be ignored during reading. Attribute names or example values containing spaces must be quoted with single quotes ('). Please note that the sparse ARFF format is currently only supported for numerical attributes. Please use one of the other options for sparse data files provided by RapidMiner if you also need sparse data files for nominal attributes.

Please have a look at the Iris example ARFF file provided in the data subdirectory of the sample directory of RapidMiner to get an idea of the described data format.

Author:
Ingo Mierswa, Tobias Malbrecht
Keywords:
arff

Nested Class Summary
 
Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractDataReader
AbstractDataReader.AttributeColumn, AbstractDataReader.CacheResetParameterObserver, AbstractDataReader.DataSet, AbstractDataReader.TooLongRowLengthException, AbstractDataReader.TooShortRowLengthException, AbstractDataReader.UnexpectedValueTypeException
 
Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractReader
AbstractReader.ReaderDescription
 
Field Summary
static java.lang.String PARAMETER_DATA_FILE
          The parameter name for "The path to the data file.
 
Fields inherited from class com.rapidminer.operator.io.AbstractDataReader
PARAMETER_COLUM_ROLE, PARAMETER_COLUM_VALUE_TYPE, PARAMETER_COLUMN_INDEX, PARAMETER_COLUMN_META_DATA, PARAMETER_COLUMN_NAME, PARAMETER_COLUMN_SELECTED, PARAMETER_ERROR_TOLERANT, PREVIEW_LINES, ROLE_NAMES
 
Constructor Summary
ArffExampleSource(OperatorDescription description)
           
 
Method Summary
protected  AbstractDataReader.DataSet getDataSet()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
protected  boolean supportsEncoding()
           
 
Methods inherited from class com.rapidminer.operator.io.AbstractDataReader
addAttributeColumn, addAttributeColumn, attributeNamesDefinedByUser, clearAllReaderSettings, clearReaderSettings, createExampleSet, deleteAttributeMetaDataParamters, fixMetaDataDefinition, getActiveAttributeColumns, getAllAttributeColumns, getAttributeColumn, getColumnCount, getErrorPreviewAsList, getGeneratedMetaData, getImportErrors, getIndexOfActiveAttributeColumn, getIndexOfAttributeColumn, getNewGenericColumnName, getPreviewAsList, getPreviewAsList, getShortPreviewAsList, guessValueTypes, hasParseError, hasParseErrorInColumn, hasParseErrorInRow, isDetectErrorsInPreview, isErrorTolerant, isMetaDataCacheable, isMetaDatafixed, loadMetaDataFromParameters, resetColumnNames, setAnnotations, setAttributeNames, setAttributeNamesDefinedByUser, setDetectErrorsInPreview, setErrorTolerant, setValueTypes, stopReading, writeMetaDataInParameter
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
addAnnotations, canMakeReaderFor, createReader, doWork, getFileParameterForOperator, registerOperator, registerReaderDescription
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_DATA_FILE

public static final java.lang.String PARAMETER_DATA_FILE
The parameter name for "The path to the data file."

See Also:
Constant Field Values
Constructor Detail

ArffExampleSource

public ArffExampleSource(OperatorDescription description)
Method Detail

getDataSet

protected AbstractDataReader.DataSet getDataSet()
                                         throws OperatorException,
                                                java.io.IOException
Specified by:
getDataSet in class AbstractDataReader
Throws:
OperatorException
java.io.IOException

supportsEncoding

protected boolean supportsEncoding()
Overrides:
supportsEncoding in class AbstractReader<ExampleSet>

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class AbstractDataReader


Copyright © 2001-2009 by Rapid-I