A subgraph is a user-defined reusable component with logic implemented as ETL graph instead of Java code.
Subgraph definition is a regular ETL graph and may use any graph elements (components, connections, lookups, sequences or parameters).
Subgraphs can be nested; a subgraph definition may use other subgraphs.
Subgraph definition is stored in a separate file with
*.sgrf
extension. In default CloverETL project
layout a directory
${PROJECT}/graph/subgraph
is created
for storing subgraph files.
You can reference this directory via
${SUBGRAPH_DIR}
parameter.
Use Subgraph component to reference a subgraph in regular ETL graph. Once configured with subgraph file the Subgraph component automatically updates its ports according to ports from subgraph definition.
Use subgraphs to visually reduce the number of component in complex ETL graphs and highlight important processing logic.
Subgraphs allow developing prefabricated blocks of logic that can be used by other members of development team. This approach to ETL development promotes reusability and standardization.
Subgraphs provide an easy way to create new connectors from webservices or databases. Webservices communicate over HTTP protocol and provide data in JSON or XML format that needs to be preprocessed before use in ETL logic. Subgraphs can hide the parsing logic and provide data in easy-to-consume format.
Similarly for databases with complex relational structure, the DBAs can develop tuned-up queries for accessing data via optimized views and indices then publish the queries in the form of subgraphs as easy-to-use connectors to common data entities.
Create a body of subgraph in the same way as an ordinary graph. you can use the same components, structure and overall approach.
Use connections, lookup tables, dictionary etc. All these features are available in the subgraphs as well as in the graph.
Define an input and output interface. The interface - input and output ports of subgraphs component - is defined by components SubgraphInput and SubgraphOutput .
Launch as a single unit or from the graph. Subgraph can be launched as a standalone graph or as component from parent graph.
ETL graph defining a subgraph contains the following sections:
Figure 42.1. Subgraph Layout
Represents inputs of subgraph
Each Subgraph contains exactly one instance of SubgraphInput component
Number of its output ports define the number of subgraph’s inputs
Represents outputs of subgraph
Subgraph contains exactly one instance of SubgraphOutput component
Number of its input ports define the number of subgraph’s outputs
Contains implementation of subgraph logic
Subgraph body can contain components (e.g. Reader) not connected to SubgraphInput or SubgraphOutput to access external data sources or static data sets
Body of subgraph may contain multiple phases and define component allocation for execution control. Phases and allocation are applied separately from the parent graph. For phases this means that as the subgraph is started in a phase of its parent graph, then the subgraph's first phase runs, then second, third etc. After all phases of the subgraph finish, it's considered finished by the parent graph and the next phase of the parent graph can start.
Components in subgraph body can use own connections, lookups, metadata and parameters
Any components connected to input ports of SubgraphInput component.
Can be used to generate test data when developing and testing subgraph logic
Components in debug input section will be automatically disabled when subgraph is executed from a parent graph, this is visualized by graying out these components.
Any components connected to output port of SubgraphOutput component, or with higher phase than SubgraphOutput
Can be used to inspect and store test data when developing and testing subgraph
Components in debug output section will be automatically disabled when subgraph is executed from a parent graph, this is visualized by graying out these components.
Figure 42.2. Example of subgraph with multiple output ports
While both Subgraphs and Jobflow provide a way of creating reusable processing logic, they serve different purposes.
Subgraphs behave the same as other built-in ETL components; they stream data to parent graph. When used in ETL graph, they execute in parallel with other ETL components running in the graph.
Use subgraph when you need to create a new component that should be used in ETL processing and exchange large amounts of data with other components.
Jobflow in its nature provides step-by-step sequential processing . Individual steps in jobflow do not exchange large amounts of data instead they pass status and configuration parameters to each other.
If you need to create logic that should be executed as one of several processing steps or you want to react to job status after its execution, create an ETL graph and call it from Jobflow via ExecuteGraph.
Note | |
---|---|
Graphs and subgraphs cannot contain cycles (Jobflow can). Thus subgraphs cannot be called recursively. |