Chapter 41. Data Partitioning (Parallel Running)

This chapter describes way to speed up graph runs with help of data partitioning.

[Note]Note

Data partitioning is available in Corporate Server and Cluster. It is not available in local projects.

What Is Data Partitioning

Data partitioning runs parts of graph in parallel. A component that is a bottleneck of a graph is run in multiple instances and each instance processes one part of the original data stream.

Parallel Run

Figure 41.1. Parallel Run


The processing can be further scaled to cluster without modification to the graph.

Partitioned Sandboxes

In CloverETL Cluster, you can partition files with temporary data to multiple cluster nodes using Partitioned sandboxes. A file stored in partitioned sandbox is split into several parts. Each part of the file is on a different cluster node. This way, you can partition both: processing and data. It reduces amount of data being transferred between cluster nodes.

When to Use Data Partitioning

Data partitioning is convenient to speed up processing when:

Way To Speed Up Processing

The way to speed up the run is to partition the data and run the slow component in parallel.

Designer and Server

In Designer and Server, you can speed up processing with copying the slow component and running it in parallel.

Parallel Run

Figure 41.2. Parallel Run


Scalable Solution in Corporate Server and Cluster

There is a better solution that avoids copying components and is scalable.

Replace Partition with ParallelPartition and SimpleGather with ParallelSimpleGather.

Set allocation to the components positioned between the cluster components: right click the component and choose Set Allocation.

Parallel Run with Cluster Components

Figure 41.3. Parallel Run with Cluster Components


In Component Allocation dialog choose By number of workers and enter number of parallel workers.

Component Allocation

Figure 41.4. Component Allocation


Components in your graph will contain text denoting the allocation.

Component Allocation

Figure 41.5. Component Allocation


How Does the Data Partitioning Work

Data partitioning runs part of a graph in parallel. The number of parallel workers is configured without copying the components. Data-partitioned graphs can take advantage of CloverETL cluster without modification.

Benefits of Data Partitioning

Things to Consider when Going Parallel