DataGenerator

Available in Community Designer

Short Description
Ports
Metadata
DataGenerator Attributes
Details
CTL Interface
Java Interface
Examples
Best Practices
See also

Short Description

DataGenerator generates data records using transformation.

Component Data source Input ports Output ports Each to all outputs Different to different outputs Transformation Transf. req. Java CTL Auto-propagated metadata
DataGeneratorgenerated01-N
no
yes
yes
yes
yes
yes
no

Icon

Ports

Port typeNumberRequiredDescriptionMetadata
Output0
yes
For generated data recordsAny
1-N
no
For generated data recordsAny

Component can send different records to different output ports using Return Values of Transformations.

Metadata

DataGenerator does not propagate metadata.

Datagenerator has no metadata template.

Output metadata fields can have any data types.

Metadata on output ports can differ.

Metadata on all output ports can use Autofilling Functions.

DataGenerator Attributes

AttributeReqDescriptionPossible values
Basic
Generator [1] Definition of records should be generated written in the graph in CTL or Java. 
Generator URL[1]Name of external file, including path, containing the definition of the way how records should be generated written in CTL or Java. 
Generator class[1]Name of external class defining the way how records should be generated. 
Generator source charset 

Encoding of external file defining the transformation.

The default encoding depends on DEFAULT_SOURCE_CODE_CHARSET in defaultProperties.

 
Number of records to generateyesNumber of records to be generated. A negative number indicates that the number is unknown at design time. See Generating Variable Number of Records. 
Deprecated
Record pattern[2] String consisting of all fields of generated records that are constant. It does not contain values of random or sequence fields. See Record Pattern for more information. User should define random and sequence fields first. See Random Fields and Sequence Fields for more information. 
Random fields[2]

Sequence of individual field ranges separated by semicolon. Individual ranges are defined by their minimum and maximum values. Minimum value is included in the range, maximum value is excluded from the range.

Numeric data types represent numbers generated at random that are greater than or equal to the minimum value and less than the maximum value. If they are defined by the same value for both minimum and maximum, these fields will equal to such specified value.

Fields of string and byte data type are defined by specifying their minimum and maximum length. See Random Fields for more information. Example of one individual field range: $salary:=random("0","20000").

 
Sequence fields[2]Fields generated by sequence. They are defined as the sequence of individual field mappings ($field:=IdOfTheSequence) separated by semicolon. The same sequence ID can be repeated and used for more fields at the same time. See Sequence Fields for more information.  
Random seed[2]

Sets the seed of this random number generator using a single long seed. Assures that values of all fields remain stable on each graph run.

Random seed has influence on field values generated using Random fields attribute only. It does not affect values generated using Generator, Generator URL or Generator class attributes. To set random seed there use setRandomSeed() function.

0-N

[1]  One of these transformation attributes should be specified instead of the deprecated attributes marked by number 2. However, these new attributes are optional.

[2]  These attributes are deprecated now. Define one of the transformation attributes marked by number 1 instead.

Details

DataGenerator generates data according to some pattern instead of reading data from file, database, or any other data source. To generate data, a generate transformation may be defined.

It uses a CTL template for DataGenerator or implements a RecordGenerate interface.

DataGenerator Deprecated Attributes

If you do not define any of these three attributes, you can instead define the fields which should be generated at random (Random fields) and which by sequence (Sequence fields) and the others that are constant (Record pattern).

Record Pattern

Record pattern is a string containing all constant fields (all except random and sequential fields) of the generated records in the form of delimited (with delimiters defined in metadata on the output port) or fixed length (with sizes defined in metadata on the output port) record.

Sequence Fields

Sequence fields can be defined in the dialog that opens after clicking the Sequence fields attribute. The Sequences dialog looks like this:

Sequences Dialog

Figure 48.4. Sequences Dialog


This dialog consists of two panes. There are all of the graph sequences on the left and all clover fields (names of the fields in metadata) on the right. Choose the desired sequence on the left and drag and drop it to the right pane to the desired field.

A Sequence Assigned

Figure 48.5. A Sequence Assigned


Remember that it is not necessary to assign the same sequence to different clover fields. But, of course, it is possible. It depends only on your decision. This dialog contains two buttons on its right side. For cancelling any selected assigned mapping or all assigned mappings.

Random Fields

This attribute defines the values of all fields whose values are generated at random. For each of the fields you can define its ranges. (Its minimum and maximum values.) These values are of the corresponding data types according to metadata. You can assign random fields in the Edit key dialog that opens after clicking the Random fields attribute.

Edit Key Dialog

Figure 48.6. Edit Key Dialog


There are the Fields pane on the left, the Random fields on the right and the Random ranges pane at the bottom. In the last pane, you can specify the ranges of the selected random field. There you can type specific values. You can move fields between the Fields and Random fields panes as was described above - by clicking the Left arrow and Right arrow buttons.

CTL Interface

CTL Templates for DataGenerator
Output records or fields

You can specify transformation using CTL in Generator or Generator URL attributes.

This can be done using the Transformations tab of the Transform Editor. However, you may find that you are unable to specify more advanced transformations using the easiest approach. This is when you need to use CTL scripting.

CTL Templates for DataGenerator

This transformation template is used only in DataGenerator.

Once you have written your transformation in CTL, you can also convert it to Java language code by using corresponding button at the upper right corner of the tab.

Table 48.2. Functions in DataGenerator

CTL Template Functions
boolean init()
RequiredNo
DescriptionInitialize the component, setup the environment, global variables
InvocationCalled before processing the first record
Returnstrue | false (in case of false graph fails)
integer generate()
Requiredyes
Input Parametersnone
ReturnsInteger numbers. See Return Values of Transformations for detailed information. Note that when Generating Variable Number of Records, STOP is NOT used to indicate an error, but to finish the generation successfully.
InvocationCalled repeatedly for each output record
Description

Defines the structure and values of all fields of output record.

If generate() fails and user has not defined any generateOnError(), the whole graph will fail.

If any part of the generate() function for some output record causes fail of the generate() function, and if user has defined another function (generateOnError()), processing continues in this generateOnError() at the place where generate() failed.

The generateOnError() function gets the information gathered by generate() that was get from previously successfully processed code. Also error message and stack trace are passed to generateOnError().

Example
function integer generate() {
   myTestString = iif(randomBool(),"1","abc");
   $in.0.name = randomString(3,5) + " " randomString(5,7);
   $in.0.salary = randomInteger(20000,40000);
   $in.0.testValue = str2integer(myTestString);
   return ALL;
}
integer generateOnError(string errorMessage, string stackTrace)
Requiredno
Input Parametersstring errorMessage
string stackTrace
ReturnsInteger numbers. See Return Values of Transformations for detailed information.
InvocationCalled if generate() throws an exception.
Description

Defines the structure and values of all fields of output record.

If any part of the generate() function for some output record causes fail of the generate() function, and if user has defined another function (generateOnError()), processing continues in this generateOnError() at the place where generate() failed.

The generateOnError() function gets the information gathered by generate() that was get from previously successfully processed code. Also error message and stack trace are passed to generateOnError().

Example
function integer generateOnError(
                      string errorMessage, 
                      string stackTrace) {
   $out.0.name = randomString(3,5) + " " randomString(5,7);
   $out.0.salary = randomInteger(20000,40000);
   $out.0.stringTestValue = "myTestString is abc";
   return ALL;
}
string getMessage()
RequiredNo
DescriptionPrints error message specified and invoked by user (called only when either generate() or generateOnError() returns value less than or equal to -2).
InvocationCalled in any time specified by user
Returnsstring
void preExecute()
RequiredNo
Input parametersNone
Returnsvoid
DescriptionMay be used to allocate and initialize resources required by the generate. All resources allocated within this function should be released by the postExecute() function.
InvocationCalled during each graph run before the transform is executed.
void postExecute()
RequiredNo
Input parametersNone
Returnsvoid
DescriptionShould be used to free any resources allocated within the preExecute() function.
InvocationCalled during each graph run after the entire transform was executed.

Output records or fields

Output records or fields are accessible within the generate() and generateOnError() functions only.

[Warning]Warning

All of the other CTL template functions do not allow to access outputs.

Remember that if you do not hold these rules, NPE will be thrown!

Java Interface

The transformation implements methods of the RecordGenerate interface and inherits other common methods from the Transform interface. See Common Java Interfaces. You can use Public Clover API too.

Following are the methods of RecordGenerate interface:

  • boolean init(Properties parameters, DataRecordMetadata[] targetMetadata)

    Initializes generate class/function. This method is called only once at the beginning of generate process. Any object allocation/initialization should happen here.

  • int generate(DataRecord[] target)

    Performs generator of target records. This method is called as one step in generate flow of records.

    [Note]Note

    This method allows to distribute different records to different connected output ports according to the value returned for them. See Return Values of Transformations for more information about return values and their meaning.

  • int generateOnError(Exception exception, DataRecord[] target)

    Performs generator of target records. This method is called as one step in generate flow of records. Called only if generate(DataRecord[]) throws an exception.

  • void signal(Object signalObject)

    Method which can be used for signaling into generator that something outside happened.

  • Object getSemiResult()

    Method which can be used for getting intermediate results out of generation. May or may not be implemented.

Examples

Generating Variable Number of Records

Sometimes the number of records to be generated is not known at design time. In such case, set the value of the Number of records to generate attribute to a negative number. The component will then generate records until the generate() function returns STOP (in this case, it is not considered an error). This works for transformations defined both in Java and CTL.

[Warning]Warning

Note that in the last iteration when STOP is returned, no records will be sent to any of the output ports.

Example 48.3. Generating Variable Number of Records in CTL

    integer total = randomInteger(1, 100);
    integer counter = 0;

    // Generates output record.
    function integer generate() {
        counter++;

        if (counter > total) {
            printLog(info, "Terminating");
            return STOP;
        }

        if ((counter % 10) == 0) {
            printLog(info, "Skipping record # " + counter);
            return SKIP;
        }

        $out.0.value = "Record # " + counter;
	
        return OK;
    }

Best Practices

If Generator URL is used, we recommend users to explicitly specify Generator source charset.

See also

Trash
Common Properties of Components
Specific Attribute Types
Common Properties of Readers
Readers Comparison