Home \| Table of Contents	Data Partitioning	CloverETL 4.6.1
Prev	Component Reference	Next

Chapter 54. Data Partitioning

Components from this category are primarily dedicated for data flow management when using Data Partitioning or in CloverETL Cluster environment, which provides ability of massive parallelization of data transformation processing. Each component in a transformation graph running with data partitioning enabled or in cluster environment can be executed in multiple instances, which is called component allocation. Component allocation specifies how many instances will be executed and where (on which cluster nodes) will they be running. See documentation for Data Partitioning or CloverETL Cluster for more details.

In general, data partitioning components can be divided into two sub-categories - partitioners and gatherers.

Parallel partitioners distribute data records from a single worker among various cluster workers. Parallel partitioners are used to change single-worker allocation to multiple-worker allocation.

ParallelPartition distributes data records among various workers, algorithm of the component is based on Partition component.
ParallelLoadBalancingPartition distributes data records among various workers, algorithm of the component is based on LoadBalancingPartition component
ParallelSimpleCopy copies data records among various workers, algorithm of the component is based on SimpleCopy component. So incoming data are duplicated and sent to all output workers.

Parallel gatherers collect data records from various cluster workers to a single worker. Parallel gatherers are actually used to change multiple-worker allocation to single-worker allocation.

ParallelSimpleGather gathers data records from various cluster workers, algorithm of the component is based on SimpleGather component
ParallelMerge gathers data records from various cluster workers, algorithm of the component is based on Merge component

Out of both basic parallel component groups stands ParallelRepartition component.

ParallelRepartition changes partitioning of already partitioned data, data are re-partitioned. For example, if you have data already partitioned according to a key by ParallelPartition component and you would like to change the key or number of partitions, this component can do it in one step, without necessity to gather all partitioned data to single worker (avoiding bottleneck) by a parallel gather and partition the data again according new rules by a parallel partitioner.

Prev	Up	Next
MoveFiles	Home \| Table of Contents	Common Properties of Data Partitioning Components

Chapter 54. Data Partitioning

See also