Details of the Example Transformation Design

Please note there are four cluster components in the graph and these components define a point of change "node allocation", so the part of the graph demarcated by these components is highlighted by the red rectangle. Allocation of these component should be performed in parallel. This means that the components inside the dotted rectangle should have convenient allocation. The rest of the graph runs just on single node.

Specification of "node allocation"

There are 2 node allocations used in the graph:

  • node allocation for components running in parallel (demarcated by the four cluster components)

  • node allocation for outer part of the graph which run on a single node

The single node is specified by the sandbox code used in the URLs of input data. The following dialog shows the File URL value: "sandbox://data/path-to-csv-file", where "data" is the ID of the server sandbox containing the specified file. And it is the "data" local sandbox which defines the single node.

The part of the graph demarcated by the four cluster components may have specified its allocation by the file URL attribute as well, but this part does not work with files at all, so there is no file URL. Thus, we will use the "node allocation" attribute. Since components may adopt the allocation from their neighbours, it is sufficient to set it only for one component.

Again, "dataPartitioned" in the following dialog is the sandbox ID.

Let's investigate our sandboxes. This project requires 3 sandboxes: "data", "dataPartitioned" and "PhoneChargesDistributed".

  • data

    • contains input and output data

    • local sandbox (yellow folder), so it has only one physical location

    • accessible only on node "i-4cc9733b" in the specified path

  • dataPartitioned

    • partitioned sandbox (red folder), so it has a list of physical locations on different nodes

    • does not contain any data and since the graph does not read or write to this sandbox, it is used only for the definition of "nodes allocation"

    • on the following figure, allocation is configured for two cluster nodes

  • PhoneChargesDistributed

    • common sandbox containing the graph file, metadata, and connections

    • shared sandbox (blue folder), so all cluster nodes have access to the same files

If the graph was executed with the sandbox configuration of the previous figure, the node allocation would be:

  • components which run only on single node, will run only on the "i-4cc9733b" node according to the "data" sandbox location.

  • components with allocation according to the "dataPartitioned" sandbox will run on nodes "i-4cc9733b" and "i-52d05425".