Short Description |
Ports |
Metadata |
HTTPConnector Attributes |
Details |
Examples |
Best Practices |
Compatibility |
See also |
HTTPConnector sends HTTP requests to a web server and receives responses. Request is written in a file or in the graph itself or it is received through a single input port. The response can be sent to an output port, stored to a specified file or stored to a temporary file. Path to the file can then be sent to a specified output port.
Component | Same input metadata | Sorted inputs | Inputs | Outputs | Each to all outputs | Java | CTL | Auto-propagated metadata |
---|---|---|---|---|---|---|---|---|
HTTPConnector | - | 0-1 | 0-2 | - |
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | For setting various attributes of the component | Any | |
Output | 0 | For response content, response file path, status code, component attributes... | Any | |
1 | For error details | Any |
HTTPConnector does not propagate metadata.
HTTPConnector has metadata templates on its ports available.
You do not have to use metadata templates on input and output edges.
See general details on Metadata Templates.
Table 61.2. HTTPConnector_Request
Field number | Field name | Data type |
---|---|---|
1 | URL | string |
2 | requestMethod | string |
3 | addInputFieldsAsParameters | boolean |
4 | addInputFieldsAsParametersTo | string |
5 | ignoredFields | string |
6 | additionalHTTPHeaderProperties | string |
7 | charset | string |
8 | requestContent | string |
9 | requestContentByte | byte |
10 | inputFileURL | string |
11 | outputFileURL | string |
12 | appendOutput | boolean |
13 | authenticationMethod | string |
14 | username | string |
15 | password | string |
16 | consumerKey | string |
17 | consumerSecret | string |
18 | storeResponseToTempFile | boolean |
19 | temporaryFilePrefix | string |
20 | multipartEntities | string |
21 | rawHTTPHeades | string[] |
Table 61.3. HTTPConnector_Response
Field number | Field name | Data type | Description |
---|---|---|---|
1 | content | string |
The content of the HTTP response as a string .
This field will be null , if the response is written to a file.
|
2 | contentByte | byte |
The raw content of the HTTP response as an array of bytes.
This field will be null , if the response is written to a file.
|
3 | outputFilePath | string |
The path to a file, where the response has been written.
Will be null , if the response is not written to a file.
|
4 | statusCode | integer | HTTP status code of the response. |
5 | header | map[string,string] | Map representing HTTP header properties from response. |
6 | rawHeaders | string[] | |
7 | errorMesage | string | Error message in case, that the error output is redirected to a standard output port. |
Table 61.4. HTTPConnector_Error
Field number | Field name | Data type | Description |
---|---|---|---|
1 | errorMessage | string | Error message |
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
URL | [1] |
URL of the HTTP server the component connects to.
May contain one or more placeholders in the following form:
*{<field name>} .
See Reading of Remote Files
for the URL format.
The HTTP, HTTPS, FTP
and SFTP protocols are supported.
Connecting via a proxy server is available, too, in a way like:
http:(proxy://proxyHost:proxyPort)//www.domain.com .
| |
Request method | Method of request. | GET (default) | POST | PUT | PATCH | DELETE | HEAD | OPTIONS | TRACE | |
Add input fields as parameters | Specifies whether additional parameters from the input edge should be added to the URL. Note: When parameters are read from the input edge and put to the query string, they can even contain special characters (?, @, :, etc.). Do not replace such characters with %-notation, HTTPConnector automatically makes them URL-encoded This feature was introduced in Clover 3.3-M3 and causes backwards incompatibility. | false (default) | true | |
Send parameters in | Specifies whether input fields should be added to the query string or method body. Parameters can only be added to the method body in case that Request method is set to POST. | QUERY (default) | BODY | |
Ignored fields | Specifies which input fields are not added as parameters. List of input fields separated by semicolon is expected. | ||
Additional HTTP headers | Additional properties of the request that will be sent to the server.
A dialog is used to create it, the final form is a sequence of
key=value pairs separated by comma and the
whole sequence is surrounded by curly braces.
The value may refer to a field or parameter using ${fieldName} or ${parameterName} notation.
| ||
Multipart entities | Specifies fields, that should be added as multipart
entities to a POST request.
Field name is used as an entity name.
List of input fields separated by semicolon is expected. | ||
Request/response charset |
Character encoding of the input/output files The default encoding depends on DEFAULT_CHARSET_DECODER in defaultProperties. | UTF-8 | other encoding | |
Request content | Request content defined directly in the graph. Can also be specified as the Input file URL or using the requestContent or requestContentByte fields in the Input mapping. | ||
Input file URL | URL of the file from which single HTTP request is read. See URL File Dialog. | ||
Output file URL | URL of the file to which HTTP response is written. See URL File Dialog. The output files are not deleted automatically and must be removed by hand or as a part of transformation. | ||
Append output | By default, any new response overwrites the older one.
If you switch this attribute to true , new
response is appended to the old ones.
Is applied to output files only. | false (default) | true | |
Input Mapping | Allows to set various properties of the component by mapping their values from input record. | ||
Output Mapping | Allows to map response data (like content, status code, ... ) to the output record.
It is also possible to map values from input fields and error details
(if Redirect error output is set to true ).
| ||
Error Mapping | Allows to map error message to the output record. It is also possible to map values from input fields and attributes. | ||
Redirect error output | Allows to redirect error details to standard output port. | false (default) | true | |
Advanced | |||
Raw HTTP Headers [2] | Additional user-defined HTTP headers defined as text. | e.g. Pragma: no-cache | |
Request Cookies | Define cookies to be send in HTTP request. The values of cookies can be set up in Input mapping. | ||
Response Cookies | Define names of response cookies to be used. The mapping can be set up in Output mapping. The names of particular cookies are separated by semicolon. | E.g. cookie1;cookie2 | |
Authentication method | Specifies which authentication method should be used. | HTTP BASIC (default) | HTTP DIGEST | ANY | |
Username | Username required to connect to the server | ||
Password | Password required to connect to the server | ||
OAuth Consumer key | Consumer key associated with a service. Defines the access token (2-legged OAuth) for signing requests - together with OAuth Consumer secret. | ||
OAuth Consumer secret | Consumer secret associated with a service. Defines the access token (2-legged OAuth) for signing requests - together with OAuth Consumer key. | ||
OAuth Access Token [3] | Additional field used during OAuth authentication. | ||
OAuth Access Token secret [3] | Additional field used during OAuth authentication. | ||
Store HTTP response to file | [4] | If this attribute is switched to
true , response is written to temporary
files with the prefix specified in the Prefix for response names attribute.
The path to these temporary files is can be retrieved using Output Mapping.
Storing response to temporary files is necessary in case the response body
is too large to be stored in a single string data field.
The temporary files are deleted automatically after graph finishes
(if it has not run in Debug mode).
| false (default) | true |
Prefix for response files | Prefix that will be used in the name of each output file with HTTP response. To this prefix, distinguishing numbers are appended. | "http-response-" (default) | other prefix | |
Stream input file |
If the request content is specified by the Input file URL attribute, the input file is uploaded using chunked transfer encoding. Set the attribute to | true (default) | false | |
Request parameters | Set up parameter that has different name from the field name in the metadata. It enables usage of parameters having names that cannot be used as metadata field names (e.g start-date). | ||
Disable SSL Certificate Validation | Disables certificate validation of the page you are connecting to. Use this attribute only if you know, what are you doing. Available since CloverETL 4.1.0-M1. | ||
Timeout |
How long component waits to get response. If it does not receive response within the specified limit, execution of component fails. Timeout is in milliseconds. | E.g. 5000 | |
Retry Count |
How many times should component retry request in case of failure. Important thing is, failure does not mean response status code different from 2xx. Failure is meant same as when component uses error port. Component consider failure if it cannot process request/response i.e. IOException. If it processes request and gets response with error status code (e.g. 500), it is not a failure. | 0 (default) | |
Deprecated | |||
URL from input field | [1] | Name of a string field
specifying the target URL you wish to retrieve.
Field value may contain placeholders in the form
*{<field name>} .
See Reading of Remote Files
for the URL format.
The HTTP, HTTPS, FTP and SFTP protocols are supported.
| |
Input field | [4] | Name of the field of the input metadata from which the request content is received. Must be of string data type. May be used for multi HTTP requests. | |
Output field | Name of the field of the output metadata to which the response response is sent. Must be of string data type. May be used for multi HTTP responses. | ||
[1] URL must be specified by setting one of URL or URL from field attributes or mapping it in the Input mapping. [2] Available since release 3.3. [3] Available since release 3.5. [4] The response can be stored either to a file specified in Output file URL or
to a temporary file (when Store response file URL to output field is set to
|
Input Mapping |
Multipart entities |
Output Mapping |
Error mapping |
Editing the Input mapping attribute opens the Transform Editor where you can decide which component attributes should be set using input record.
Figure 61.4. Transform Editor in HTTPConnector
The dialog provides you with all the power and features known from Transform Editor and CTL.
Note | |
---|---|
All kinds of CTL functions are available to modify the input field value to be used. |
Since release 3.5.4 you can set up multipart entities in transform editor. Input mapping now offers new fields derived from value of Multipart entities attribute. For example field1;field2 as value of multipart entities generates following fields.
Figure 61.5. Multipart entities in input mapping
The generated fields can be used to control multipart entities.
If you deal with Multipart entities, you have to use POST method.
List of input fields |
Map content of multipart entity |
Map content and filename |
Use file as multipart entity |
Compatible with previous versions. Multipart entities attribute contains semicolon separated list of fields from input record. Each field is a multipart entity. Name is same as field name, field value is used as content.
Use input mapping to set content of multipart. Multipart name will be same as fieldname and content will be specified by mapping.
Multipart content will be used by mapping, but there will be additional multipart header in request using filename as mapped.
Example 61.1. CTL Mapping and multipart entities
The CTL mapping
function integer transform() { $out.4.field1_EntityContent="My custom content"; $out.4.field1_EntityFileNameAttribute="MyFilename"; returnALL; }
produces following multipart content.
CB5PZVJDq5RyTWoZqxvtjlbVM0CrMa3Mt ContentDisposition: formdata; name="field1"; filename="MyFilename" ContentType: text/plain; charset=UTF8 ContentTransferEncoding: 8bit My custom content CB5PZVJDq5RyTWoZqxvtjlbVM0CrMa3Mt
To use files as multipart entities map only *_File field. Do not map _Content field.
$out.3.field3_EntitySourceFile = "${PROJECT}/workspace.prm";
This will upload file workspace.prm
as multipart entity.
3xEKe3wUSOl2cRnjwh1UsPVnDOoL7D ContentDisposition: formdata; name="field3"; filename="workspace.prm" ContentType: application/octetstream ContentTransferEncoding: binary ... [here is content of file] 3xEKe3wUSOl2cRnjwh1UsPVnDOoL7D
File can be specified by URL similar to fileURL attribute in readers. But it cannot use port reading or dictionary reading.
Editing the attribute opens the Transform Editor where you can decide what should be sent to an output port.
Figure 61.6. Transform Editor in HTTPConnector
The dialog provides you with all the power and features known from Transform Editor and CTL.
To do the mapping in a few basic steps:
Provided you already have some output metadata, just left-click an item in the left-hand pane and drag it onto an output field. This will send the result data to the output.
If you do not have any output metadata:
Drag a Field from the left hand side pane and drop it into the right hand pane (an empty space).
This produces a new field in the output metadata.
You can map various data to output port:
Values of fields from input metadata - you can send values from input fields to the output port. This is mainly useful, when you are using some kind of session identifier for HTTP requests.
Result - provides result data. These includes:
content - the content of the HTTP response as a string
.
This field will be null
, if the response is written to a file.
contentByte - the raw content of the HTTP response as an array of bytes.
This field will be null
, if the response is written to a file.
outputFilePath - the path to a file, where the response has been written.
Will be null
, if the response is not written to a file.
statusCode - HTTP status code of the response.
header - map representing HTTP header properties from response.
rawHeaders - headers of the response.
errorMessage - error message in case, that the error output is redirected to a standard output port.
Attributes - provides values of the component attributes:
URL - the URL where the request has been sent.
requestMethod - method that was used for the request.
requestContent - content of the request, that has been sent (if specified as a string).
inputFileUrl - URL of the file containing request content.
Note | |
---|---|
Output mapping uses CTL (you can switch to the Source tab). All kinds of functions are available to modify the value to be stored in the output field.
$out.0.prices = find($in.1.content, "price: .*? USD")
finds all occurrences of the form |
If you let output mapping empty, the default output mapping is used:
$out.0.* = $in.0.*; $out.0.* = $in.1.*;
The default mapping has been introduced in version 4.1.0.
Editing the Error mapping attribute opens the Transform Editor where you can map error details to an output port. The behavior is very similar to the Output mapping
If you let error mapping empty, the default error mapping is used:
$out.1.* = $in.0.*; $out.1.* = $in.1.*;
The default mapping has been introduced in version 4.1.0.
When the graph's log level is set to DEBUG, the HTTPConnector prints the HTTP request and response to graph log.
Download a content of a web page www.cloveretl.com
using HTTPConnector.
Save the result to the file for further processing.
Use URL and Output file URL attributes.
The downloaded page will be saved into result.html
file in ${DATAOUT_DIR}
directory.
Attribute | Value |
---|---|
URL | http://www.cloveretl.com/ |
Output file URL | ${DATAOUT_DIR}/result.html |
Download document from https://protected.example.org/document.html
.
The site requires HTTP basic authentication.
Set up URL, Output file URL, Username and Password attributes. We suggest to use secure parameters to store your password.
Attribute | Value |
---|---|
URL | https://protected.example.org/document.html |
Output file URL | ${DATAOUT_DIR}/document.html |
Username | myUserName |
Password | ${PASSWORD} |
Alternative solution is to connect an edge to the first output port instead of filling Output file URL attribute. The result will be send to the edge. No output mapping is necessary.
Download content of page http://www.cloveretl.com/
.
The page is accessible via proxy on 10.0.3.5 listening on tcp port 3128.
Use the URL attribute. You can use Output file URL to write result to the file or connect an output edge.
Attribute | Value |
---|---|
URL | http:(proxy://10.0.3.5:3128)//www.cloveretl.com/ |
Output file URL | ${DATAOUT_DIR}/result.html |
Note: The proxy may introduce some limitations. E.g. It may deny you to connect via HTTPS etc.
The problem to be solved is the similar to the previous example.
The difference is, that proxy requires username (test
) and password (securePassword
).
Attribute | Value |
---|---|
URL | http:(proxy://test:[email protected]:3128)//www.cloveretl.com/ |
Output file URL | ${DATAOUT_DIR}/result.html |
Connect to Twitter API and get some tweets about java.
Use URL, OAuth Consumer key, OAuth Consumer secret, OAuth Access Token and OAuth Access Token secret attributes.
Connect an edge to the first output port to pass results by egde or fill in Output file URL attribute to write down results to a file.
Attribute | Value |
---|---|
URL | https://api.twitter.com/1.1/search/tweets.json?q=java&count=20 |
OAuth Consumer key | yYjLhENks7mNlt7k4l2hKuHXP |
OAuth Consumer secret | OE1dkaadjJR8LSOFFlakeH4YRlLkaiqnvVlSlAxZmNlrtoHpyI |
OAuth Access Token | 3062213700-IJNdsaG3e4vwUasoro4T5p5V2aOxEwYasvrlVs3 |
OAuth Access Token secret | S2hl7ivynvXI69kzky7Fx3ZJ84ZBCK6vt2G7bW3TFNTO7 |
Note: The credentials in this example are not valid, you have to use your own credentials.
Send a file using multipart entities.
The file content is available in field1
field.
Use URL, Request method, Multipart entities and Input mapping attributes.
Attribute | Value |
---|---|
URL | http://www.example.com/ |
Request method | POST |
Add input fields as parameters | true |
Multipart entities | field1 |
Input mapping | See the code below |
function integer transform() { $out.4.field1_EntityContent = $in.0.field1; return ALL; }
Map multipart entities in the Input mapping dialog.
Connect to www.my-sometimes-responding-server.com
which sometimes fails to respond.
The response has to be returned within 20 seconds, otherwise connection should be considered as nonresponding.
Make at most 5 attempts in total.
Use Timeout to set up time limit on connection to avoid waiting if server does not reply. If server responds sometimes only, use Retry count to ask several times.
Attribute | Value |
---|---|
URL | http://www.my-sometimes-responding-server.com/ |
Request method | GET |
Timeout | 20000 |
Retry count | 4 |
Timeout is in milliseconds. 20000 milliseconds is 20 seconds. Retry count set to 4 causes up to 4 additional retries (if necessary). At most five requests are performed in total.
We recommend users to explicitly specify Request/response charset.
Since v3.3.0-M3 it is no longer necessary to encode field values used as Query parameters before passing them to HTTPConnector - they are encoded automatically. This, however, breaks backward compatibility, so be aware of this fact.
Since v3.3.0-M3 it is possible to use Output mapping to retrieve path to an output file, when the response is stored to a file (whether it is stored to temporary file or user-specified file). The file path is no longer sent to an output port automatically (as was the case for temporary files).
Since v3.5.4 you can map file as a multipart entity. You can map multipart entities in transform editor too.
Since v4.1.0-M1 you can disable SSL Certificate validation.
Since v4.1.0 you can set up Timeout and Retry count.
Since v4.1.0 default output mapping or error mapping is used if output mapping or error mapping is not defined.