com.rapidminer.operator.io
Class ArffExampleSource
java.lang.Object
com.rapidminer.tools.AbstractObservable<Operator>
com.rapidminer.operator.Operator
com.rapidminer.operator.io.AbstractReader<ExampleSet>
com.rapidminer.operator.io.AbstractExampleSource
com.rapidminer.operator.io.AbstractDataReader
com.rapidminer.operator.io.ArffExampleSource
- All Implemented Interfaces:
- ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>
public class ArffExampleSource
- extends AbstractDataReader
This operator can read ARFF files known from the machine learning library Weka.
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes
a list of instances sharing a set of attributes. ARFF files were developed by the
Machine Learning Project at the Department of Computer Science of The University
of Waikato for use with the Weka machine learning software.
ARFF files have two distinct sections. The first section is the Header information,
which is followed the Data information. The Header of the ARFF file contains the name
of the relation (@RELATION, ignored by RapidMiner) and a list of the attributes, each of which
is defined by a starting @ATTRIBUTE followed by its name and its type.
Attribute declarations take the form of an orderd sequence of @ATTRIBUTE statements.
Each attribute in the data set has its own @ATTRIBUTE statement which uniquely defines
the name of that attribute and it's data type. The order the attributes are declared
indicates the column position in the data section of the file. For example, if an
attribute is the third one declared all that attributes values will be found in the third
comma delimited column.
The possible attribute types are:
numeric
integer
real
{nominalValue1,nominalValue2,...}
for nominal attributes
string
for nominal attributes without distinct nominal values (it is
however recommended to use the nominal definition above as often as possible)
date [date-format]
(currently not supported by RapidMiner)
Valid examples for attribute definitions are
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
The ARFF Data section of the file contains the data declaration line @DATA followed
by the actual example data lines. Each example is represented on a single line, with
carriage returns denoting the end of the example. Attribute values for each example
are delimited by commas. They must appear in the order that they were declared in the
header section (i.e. the data corresponding to the n-th @ATTRIBUTE declaration is
always the n-th field of the example line). Missing values are represented by a single
question mark, as in:
4.4,?,1.5,?,Iris-setosa
A percent sign (%) introduces a comment and will be ignored during reading. Attribute
names or example values containing spaces must be quoted with single quotes ('). Please
note that the sparse ARFF format is currently only supported for numerical attributes.
Please use one of the other options for sparse data files provided by RapidMiner if you also
need sparse data files for nominal attributes.
Please have a look at the Iris example ARFF file provided in the data subdirectory
of the sample directory of RapidMiner to get an idea of the described data format.
- Author:
- Ingo Mierswa, Tobias Malbrecht
- Keywords:
- arff
Field Summary |
static java.lang.String |
PARAMETER_DATA_FILE
The parameter name for "The path to the data file. |
Methods inherited from class com.rapidminer.operator.io.AbstractDataReader |
addAttributeColumn, addAttributeColumn, attributeNamesDefinedByUser, clearAllReaderSettings, clearReaderSettings, createExampleSet, deleteAttributeMetaDataParamters, fixMetaDataDefinition, getActiveAttributeColumns, getAllAttributeColumns, getAttributeColumn, getColumnCount, getErrorPreviewAsList, getGeneratedMetaData, getImportErrors, getIndexOfActiveAttributeColumn, getIndexOfAttributeColumn, getNewGenericColumnName, getPreviewAsList, getPreviewAsList, getShortPreviewAsList, guessValueTypes, hasParseError, hasParseErrorInColumn, hasParseErrorInRow, isDetectErrorsInPreview, isErrorTolerant, isMetaDataCacheable, isMetaDatafixed, loadMetaDataFromParameters, resetColumnNames, setAnnotations, setAttributeNames, setAttributeNamesDefinedByUser, setDetectErrorsInPreview, setErrorTolerant, setValueTypes, stopReading, writeMetaDataInParameter |
Methods inherited from class com.rapidminer.operator.Operator |
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
PARAMETER_DATA_FILE
public static final java.lang.String PARAMETER_DATA_FILE
- The parameter name for "The path to the data file."
- See Also:
- Constant Field Values
ArffExampleSource
public ArffExampleSource(OperatorDescription description)
getDataSet
protected AbstractDataReader.DataSet getDataSet()
throws OperatorException,
java.io.IOException
- Specified by:
getDataSet
in class AbstractDataReader
- Throws:
OperatorException
java.io.IOException
supportsEncoding
protected boolean supportsEncoding()
- Overrides:
supportsEncoding
in class AbstractReader<ExampleSet>
getParameterTypes
public java.util.List<ParameterType> getParameterTypes()
- Description copied from class:
Operator
- Returns a list of ParameterTypes describing the parameters of
this operator. The default implementation returns an empty list if no
input objects can be retained and special parameters for those input
objects which can be prevented from being consumed.
ATTENTION! This will create new parameterTypes. For calling already existing
parameter types use getParameters().getParameterTypes();
- Specified by:
getParameterTypes
in interface ParameterHandler
- Overrides:
getParameterTypes
in class AbstractDataReader
Copyright © 2001-2009 by Rapid-I