Short Description |
Ports |
Metadata |
SpreadsheetDataWriter Attributes |
Details |
Examples |
Best Practices |
Compatibility |
Troubleshooting |
See also |
SpreadsheetDataWriter writes data to spreadsheets – XLS or XLSX files.
The component can insert, overwrite or append data, it makes possible to work with multiline records, it supports formatting and writing data vertically or horizontally. It can be used for filling in forms.
Component | Data output | Input ports | Output ports | Transformation | Transf. required | Java | CTL | Auto-propagated metadata |
---|---|---|---|---|---|---|---|---|
SpreadsheetDataWriter | XLS(X) file | 1 | 0-1 |
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | Incoming records to be written out to a spreadsheet. | Any | |
Output | 0 | For port writing. See Writing to Output Port. | One field (byte , cbyte ). |
SpreadsheetDataWriter does not propagate metadata.
SpreadsheetDataWriter has no metadata template.
Input metadata of SpreadsheetDataWriter can have arbitrary field data types;
writing lists and maps is not supported as there are no list and byte fields in .xls
files.
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
File URL | yes | Specifies where data will be written to: an XLS or XLSX file, an output port, or a dictionary. See Supported File URL Formats for Writers. In case of writing to output port Type of formatter must be set up to XLS or XLSX. | |
Sheet | A name or number (zero-based) of the sheet to write into. Unless set, a sheet with a default name is created and inserted after all existing sheets. You can specify multiple sheets separated by a semicolon ";". For details on partitioning, see Writing Techniques & Tips for Specific Use Cases. | 0-N | |
Mapping | [1] | A visual editor in which you define how input data is mapped to the output spreadsheet(s). See Details for more information. | |
Mapping URL | [1] | External file containing the mapping definition. | |
Write mode | Determines how data is written to
the output spreadsheet.
Possible values:
In-memory writing modes store all values in memory allowing for faster reading. Suitable for smaller files. In streaming mode (available for XLSX only) the file is written out directly without storing anything in memory. Streaming should thus allow you to write bigger files without running out of memory. | see Description | |
Actions on existing sheets |
Defines what action is performed
if the specified Sheet
already exists in the target spreadsheet.
The attribute works in accordance
with Write mode.
Available options:
| see Description | |
Advanced | |||
Template File URL |
A template spreadsheet file which is duplicated into the output file and populated with data according to the defined mapping. The template can be any spreadsheet, typically containing the header, footer and data sections (one empty line to be replicated during writing). If looking for more tips, see Writing Techniques & Tips for Specific Use Cases. It is required that formats of the output file and the template file match. Usage of XLTX files is limited (see Notes and Limitations), rather than XLTX use XLSX files as templates. | ||
Create directories | If set to true , non existing directories included in
the File URL path will be automatically created.
| false (default) | true | |
Records per file | Maximum number of records that are written to a single file. See Partitioning Output into Different Output Files | 1-N | |
Number of skipped records | Total number of records throughout all output files that will be skipped. See Selecting Input Records . | 0-N | |
Max number of records | Total number of records throughout all output files that will be written out. See Selecting Input Records . | 0-N | |
Partition key | A key whose values control the distribution of records among multiple output files. See Partitioning Output into Different Output Files for more information. | ||
Partition lookup table | [2] | The ID of a lookup table. The table serves for selecting records which should be written to the output file(s). See Partitioning Output into Different Output Files for more information. | |
Partition file tag | By default, output files are numbered.
If this attribute is set to
Key file tag , output files are named
according to values of Partition key
or Partition output fields.
See Partitioning Output into Different Output Files
for more information.
| Number file tag (default) | Key file tag | |
Partition output fields | [2] | Fields of Partition lookup table whose values serve for naming output file(s). See Partitioning Output into Different Output Files for more information. | |
Partition unassigned file name | The name of a file unassigned records should be written into (if there are any). Unless specified, data records whose key values are not contained in Partition lookup table are discarded. See Partitioning Output into Different Output Files for more information. | ||
Sorted input | In case of partitioning into multiple output files is turned on, all output files are open at once. Which could lead to undesirable memory footprint for many output files (thousands). Moreover, for example unix-based OS usually have very strict limitation of number of simultaneously open files (1024) per process. In case you run into one of these limitations, consider to sort the data according partition key using one of our standard sorting components and set this attribute to true. The partitioning algorithm does not need to keep open all output files, just the last one is open at one time. See Partitioning Output into Different Output Files for more information. | false (default) | true | |
Type of formatter | Specifies the formatter to be used.
By default, the component guesses according to the output file extension –
XLS or XLSX .
| Auto (default) | XLS | XLSX | |
Create empty files | If set to false ,
prevents the component from creating empty output file
when there are no input records. | true (default) | false | |
[1] The two mapping attributes are mutually exclusive. You either specify the mapping yourself in Mapping, OR supply it in an external file via Mapping URL. The third option is to leave all mapping blank. [2] Either both or neither of these attributes has to be specified. |
Introduction to Spreadsheet Mapping |
Spreadsheet Mapping Editor |
SpreadsheetDataWriter writes data to XLS or XLSX files.
It offers advanced features for creating spreadsheets:
insert/overwrite/append modes
powerful visual mapping for complex spreadsheets
explicitly defined mapping or dynamic auto-mapping
form writing
multiline records
vertical/horizontal writing
cell formatting support
streaming mode for performance and huge data loads
dynamic file/sheet partitioning
template support
Supported file formats:
XLS: only Excel 97/2003 XLS files (BIFF8)
XLSX: Open Document Format, Microsoft Excel 2007 and newer
Supported outputs:
local or remote (FTP, HTTP, CloverETL Server sandbox, etc. – see File URL in SpreadsheetDataWriter Attributes)
output port
dictionary
A mapping tells the component how to write Clover records to a spreadsheet. The mapping defines where to put metadata information, data, format, writing orientation etc.
In the mapping you define a binding between a Clover field and so called leading cell. Data for that field is written into the spreadsheet beginning at the leading cell position either downwards (vertical orientation; default) or to the right (horizontal).
Each leading cell-field binding is independent of each other. That can be used to create complex mappings (e.g. one record can be mapped to multiple rows; see Rows per record global mapping property)
Each Clover field can be mapped to a spreadsheet cell in one of the following Mapping modes:
Explicit – statically maps a field to a fixed leading cell of your preference. Typically the most used mapping mode for the writer (see Basic Mapping Example). Explicit mode can be combined with other mapping modes.
Tip | |
---|---|
To map a field (or a whole record) explicitly, simply drag the field (record) to the spreadsheet preview area and drop it onto desired location. You can select multiple fields. |
Map by order - dynamic mapping mode; cells in 'by order' mode are filled in left-right-top-down direction with input record fields by the order in which the fields appear in the input metadata. Only fields which are not mapped explicitly and not mapped by name are taken into account.
Map by name - this mode applies only to cases when you are writing to an already existing sheet(s). Cells mapped "by name" are bound to input fields using 'late binding' on runtime according to their actual content, which presumably is a 'header'. The component tries to match the cell's content with a field name or label (see Field Name vs. Label vs. Description) from input metadata. If such a match could be found then the mapped cell is bound to the corresponding input field. If there is no match for the cell (i.e. cell's content is not a field name/label) then the cell is unresolved – no input field could be assigned.
Note that unresolved cells are not a bad thing – you might be writing into say a group of similar templates, each containing just a subset of fields in the input metadata. Mappings with unresolved cells do not result in the graph failing on execution.
This mode comes in handy when you are writing using pre-defined templates (the Template file URL attribute) . See Writing Techniques & Tips for Specific Use Cases.
Note | |
---|---|
Both Map by order and Map by name modes try to automatically map the contents of the output file to the input metadata. Thus these modes are useful in cases when you write into multiple files and you want to design a single 'one-fits-all' generic mapping, typically for multiple templates. Replacing input metadata with another does not require any change in the mapping – it is recomputed accordingly to the mapping logic. |
Implicit – default case when the mapping is blank. The component will assume Write header to true
and map all input fields by order, starting in top left hand corner.
Spreadsheet mapping editor is the place where you define your mapping and its properties. The mapping editor previews sheets of the output file (if any; otherwise shows a blank spreadsheet). However, the same mapping is applied to a whole group of output files/sheets (e.g. when partitioning into multiple sheets or files).
To start mapping, fill in the File URL and (optionally) Sheet attributes with the file (and sheet name) to write into, respectively. After that, edit Mapping to open the spreadsheet mapping editor. When you write into a new (empty) spreadsheet, the mapping editor will appear blank like this
Figure 54.17. Spreadsheet Mapping Editor
In the editor, you map the input fields on the left hand to the spreadsheet on the right hand. Either use mouse drag'n'drop or the Map by name, Map by order buttons to create leading cells in the spreadsheet.
You can see the following parts of the editor:
Toolbar – buttons controlling how you Map Clover fields to spreadsheet data (either by order, or by name) and global Data offsets control (see Advanced Mapping Options for an explanation of data offsets).
Sheet preview area – this is where you create and modify all the mapping of the output file.
Input metadata – Clover fields you can map to spreadsheet cells. This is the metadata assigned to the input edge. (You cannot edit it.)
Properties – controls properties of mapped cells and Global mapping attributes; can be applied to a single or a group of cells at a time
Summary tab – a place where you can neatly review the Clover-to-spreadsheet mapping you have made.
Cells in the preview area highlighted in various colours to identify whether and how they are mapped.
Orange are the leading cells and they form the header. Properties can be adjusted on each orange cell to create complex mappings; see Advanced Mapping Options.
Cells in dashed border, which appear only when a leading cell is selected, indicate the data area.
Yellow cells demonstrate the first record which will be written.
Basic Mapping Example |
Advanced Mapping Options |
A typical example of what you will want to do in SpreadsheetDataWriter is writing into an empty spreadsheet. This section describes how to do that in a few easy steps.
Open Spreadsheet Mapping Editor by editing the Mapping attribute.
Click the whole record in Input metadata (excel_types_nocurrency
in the example below)
and drag it to the spreadsheet preview area to cell A1 and drop.
You will see that for each field of the input record a leading cell is created, producing a default explicit mapping (explained in Introduction to Spreadsheet Mapping). See Figure 54.18, Explicit mapping of the whole record
In Properties (bottom left hand corner),
make sure Write header is set to true
.
This writes field names (labels actually) to leading cells first, followed by actual data; use this whenever you want to output a header.
Furthermore in Properties, notice that Orientation is Vertical. This makes the component produce output by rows (opposite to Horizontal orientation, where writing advances by columns).
Notice that Data offsets (global) is set to 1. That means that data will be written 1 row below the leading cell, making room for the header cell.
Note | |
---|---|
Actually, you will achieve the same result if you leave the mapping blank (implicit mapping). In that case the first row is mapped by order. |
Figure 54.18. Explicit mapping of the whole record
This section provides an explanation of some more advanced concepts building on top of the Basic Mapping Example.
Data offsets |
Rows per record |
Combination of Data offsets and Rows per record |
Max number of records |
Formatting cells (Field with format) |
Cells with Hyperlink |
Data offsets determines the position where data is written relative to the leading cell position.
Basically, its value represents 'a number of rows (in vertical mode) or columns (in horizontal mode) to be skipped before the first record is written (relative to the leading cell)'.
Data offset 0 does not skip anything and data is written right at the leading cell position (Write header option does not work for this setting).
Data offset 1 is typically used when header is to be written at the leading cell position – so you need to shift the actual data by one row down (or column to the right).
Click arrow buttons in the Data offsets (global) control to adjust data offsets for the whole spreadsheet.
Additionally, you can use the spinner in
→ → of each leading cell (orange) to adjust data offset locally, i.e. for a particular column only. Notice how modifying data offset is visualised in the sheet preview – the 'omitted' rows change colour. By following dashed cells, which appear when you click a leading cell, you can quickly check where your record will be written.Tip | |
---|---|
The arrow buttons in Data offsets (global) only shift the data offset property of each cell either up or down. So mixed offsets are retained, just shifted as desired. To set all data offsets to a single value, enter the value into the number field of Data offsets (global). Note that if there are some mixed offsets, the value is displayed in gray. |
Figure 54.19. The difference between global data offsets set to 1 (default) and 3. In the right hand figure, writing would start at row 4 with no data written to rows 2 and 3.
Figure 54.20. Global data offsets is set to 1. In the last column, it is locally changed to 4. In the output file, the initial rows of this column would be blank, data would start at D5.
Rows per record is a Global property specifying a gap between rows. Default value is 1 (i.e. there is no gap). Useful when mapping multiple cells above each other (for a single record) or when you need to print blank rows in between your data. Best imagined if you look at the figure below:
Figure 54.21. With Rows per record set to 2 in leading cells Name and Address, the component always writes one data row, skips one and then writes again. This way various data does not get mixed (overwritten by the other one). For a successful output, make sure Data offsets is set to 2.
Combination of Data offsets (global and local) and Rows per record – you can put the settings described in preceding bullet points together. See example:
Figure 54.22. Rows per record is set to 3. Data in the first and third column will start in their first row (because of their data offsets being 1). The second and fourth columns have data offsets 2 and 4, respectively. The output will, thus, be formed by 'zig-zagged' cells (the dashed ones – follow them to make sure you understand this concept clearly).
Max number of records is a Global property which you can specify via component attributes, too (see SpreadsheetDataWriter Attributes). If you reduce it, you will notice the number of dashed cells in the spreadsheet preview reduces as well (highlighting only the cells which will be written out in fact).
In a spreadsheet, every single cell can have its own format (in Excel, right-click on a cell -> Format cells; Number tab). This format is represented by a format string (not Clover format string, but Excel-specific format string). Since format in Clover is defined globally for a field in metadata, not per record, writing formats to Excel can be tricky. SpreadsheetDataWriter offers two ways of writing Excel-specific format to cells:
Case 1:
You can specify the format for a metadata field (its Format property in metadata). That means all values of the field written
to the sheet will have the specified format.
You need to prefix the Format in metadata with excel:
(e.g. excel:0.000%
for percents with three decimals) because the component ignores standard format strings (as the Clover-to-Excel format conversion is not possible).
Case 2:
You provide two input fields for a single cell: one specifying the cell value and the other defining its format.
Note | |
---|---|
|
Which format is used if both are set?
Do you have the format mapped by the Field with format property? Yes – the component uses it.
Is Field with format not specified OR a value of
that particular field
is empty (null or empty string)? Yes – use Format from the metadata field (if set with excel:
prefix).
See also Field Details.
You can use excel:General
format – either in Field with format
or in metadata Format – the output will be set to general format (Excel terms).
Example 54.8. Writing Excel format
Let us have two fields:
fieldValue (integer)
and
fieldFormat (string)
mapped onto cell A1
(one as value, the other as Field with format).
Imagine these incoming records:
(100, ”#00,0”)
writes value 100
and format ”#00,0” into cell A1
(100, “General”)
writes value 100
into cell A1 and sets its format to General
(100, ””) or (100, null)
writes value 100
into cell A1 and since fieldFormat
is empty it looks into the Format metadata
attribute of fieldValue
(NOT fieldFormat
!):
if there is no format, uses General
if there is the “excel:XYZ” format string, applies format XYZ to the cell
if there is another format (anything not prefixed by excel:
), uses General (Clover-to-Excel format
conversion is not performed)
Note | |
---|---|
When Excel format is specified in
When Excel format is passed in data, as the aforementioned
Note that the |
SpreadsheetDataWriter lets you write hyperlinks into particular cells. Each hyperlink is defined using two input fields. One input field defines text of the link, other field defines target of the link.
Links can be of several types: Document, E-mail, File or URL.
Link is created in Properties pane. Map the field with link text to desired cell, change Hyperlink type (in Properties) to desired type and select field with target in Field with hyperlink address.
Hyperlinks are persisted to file along with font and style (blue and underline).
Example 54.9. Writing hyperlinks
Following are examples of proper addresses for all hyperlink types.
Document
K2
for a link to cell K2 in the same sheet.
'my sheet'!K2
for a link to cell K2 in a sheet with name my sheet
.
mailto:[email protected]
File
report_details.txt
for a relative link to file in the same directory as the spreadsheet file.
C:/path/to/file/report_details.txt
for an absolute link to file.
URL
http://www.cloveretl.com
Writing using template |
Filling forms |
Charts and formulas |
Multiple write passes into one sheet |
Partitioning |
Writing huge files |
Reviewing your mapping |
Sometimes you may want to prepare in advance a nicely formatted template in Excel, maybe including some static headers, footer, etc. and use Clover to just fill in the data for you. And it might be that you will want to reuse the template without overwriting it.
This is where SpreadsheetDataWriter template feature comes in handy. The component can take a previously designed template Excel file (see Template File URL in SpreadsheetDataWriter Attributes ), make a copy of it into the designated output file (see File URL) and write data to it, retaining the rest of the template.
A template can be any Excel file, usually containing three sections: the header, one template row for data and the rest as the footer.
Figure 54.23. Writing into a template. Its original content will not be affected, your data will be written into Name, Surname and Age fields.
Notice the template row. It is a row like any other but in the mapping editor, it is designated as the first row of mapped data. The component duplicates that row each time it writes a new data. This way you can assign arbitrary formatting, colors etc. on this data row and it is applied to all written rows.
The template file is not changed or affected in any other way.
Important | |
---|---|
There is only one reasonable setting when using templates, although all other modes work as expected (they do not, however, produce results that you would want). The settings are:
|
You can use the component to write into forms without affecting its original boxes. Use these settings:
Send just one input record to the component's input containing all the form values. Set File URL to the form file to be filled. Then map the input fields explicitly one by one into corresponding form cells using the preview sheet.
Next, use these settings:
Write header – false
Data offsets (global) – 0 (this ensures data will be written right into the leading cells you have mapped – the orange ones)
If you use Insert, Append or Overwrite modes, formulas and charts that work with the data areas written in Clover will be properly updated when viewed in Excel.
Note | |
---|---|
Generating formulas, charts or other Excel objects is not currently supported. |
You can use multiple sequential writes into a single sheet to produce complex patterns. To do so, set up multiple SpreadsheetDataWriter components writing the same file/sheet and feed them various inputs.
Important | |
---|---|
Do not forget to put multiple components writing to the same file into different phases. Otherwise the graph will fail on write conflict. |
Typically, you will use the Overwrite in sheet (in-memory)
write mode for all components in the sequence.
A neat technique
is partitioning into individual sheets according
to values of a specified key field (or more fields).
Thus you can e.g. write
data for different countries into different sheets.
Simply choose Country
as the partitioning key. This is done by editing
the Sheet attribute; switch to Partition data into sheets
by data fields
and select a field
(or more fields using
Ctrl+click
or
Shift+click
).
Figure 54.24. Partitioning by one data field
You can partition according to more than one field. In that case, output sheet names will be a compound of field names you have selected. Example: You have customer orders stored in one CSV file. You would like to separate them into sheets according to e.g. a name of the shop and a city. Use SpreadsheetDataWriter in create new file mode while partitioning according to the two fields. It will produce sheets like:
Pete's Grocery,New York
Hoboken Deli,New Jersey
Al's Hardware,New York
etc., each of them containing data just for one shop.
Although Excel format is not primarily designed for big data loads, its processing can easily grow to enormous memory requirements.
The format itself has some limitations:
Excel 97/2003 – XLS
Maximum of 65,535 rows and 256 columns per sheet
Maximum number of sheets – 255
Excel 2007 and newer – XLSX
Maximum number of rows: unlimited (but be aware that Excel itself works only with first 1,048,576 rows the file contains). All the data can be read back by SpreadsheetDataReader or other tools that support large files.
Maximum number of columns: 16,384
Maximum number of sheets: unlimited (as long as you have memory)
Tip | |
---|---|
Working with larger spreadsheets is memory consuming and although the component does its best to optimize its memory footprint, bear these few tips in mind:
|
Usually you would use the
Create new file (streaming – XLSX only)
and Overwrite sheet (streaming - XLSX only)
write modes. Other write modes do not support streaming.
In complex mappings with many metadata fields, you might want to check if everything has been mapped properly. Whenever during your work in Spreadsheet Mapping Editor, switch to the Summary tab and observe an overview of leading cells and mappings like this one:
Figure 54.25. Mapping summary
Encryption
Writing of encrypted XLS or XLSX files is not supported (unlike SpreadsheetDataReader which can read encrypted files)
XLTX vs. XLSX templates
For technical reasons it is currently not possible to use an XLTX template for XLSX output. Nevertheless, the difference between XLTX and XLSX files is minimal. Therefore, we recommend you use XLSX as the format for both the template and output files. For XLS and XLT files, there is no such limitation.
Mapping editor on server files
A spreadsheet mapping editor on server files can operate as usual, except for a case when File URL contains wildcard characters. In that case CloverETL Designer is not able to find matching server files and the mapping editor shows no data in the spreadsheet preview. This is going to be fixed in next releases.
Error reporting
There is no error port on the component. By design, either the component configuration is valid and will then succeed in writing records to a file, or it will fail with a fatal error (invalid configuration, no space left on device, etc.). No errors per input record are generated.
Width of columns
If the SpreadsheetDataWriter writes to newly created sheet, or to existing sheet which is cleaned first (i.e. Actions on existing sheets is set to Clear target sheet(s)), the component automatically adjusts width of columns so that it matches width of the most widest cell content in each particular column. Column widths is not adjusted if a template is used or when writing into existing sheet (which is not cleaned first). This means that column widths from template are preserved. Also column widths of already existing sheets are kept when appending/inserting/overwriting data of that sheet.
Lists and Maps
SpreadsheetDataWriter cannot write lists and maps.
Lists of string
s, byte
s and cbyte
s are converted to string.
SpreadsheetDataWriter ignores excel:raw format. When it's set, the component acts as if the format property is empty.
SpreadsheetDataWriter is not a commercial component since CloverETL version 4.0.0-M2.
Since CloverETL 4.0.7 you can write hyperlinks into spreadsheets.
Since 4.4.0-M2, CloverDataWriter can write to output port just to byte
or cbyte
field.
If you write spreadsheet to a dictionary, attribute Type of formatter must be set to XLS or XLSX. It cannot be Auto (default value).