The URL attributes may be defined using the URL File Dialog.
Unless explicitly stated otherwise, URL attributes of File Operation components accept multiple URLs separated with a semicolon (';').
Important | |
---|---|
To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows). |
Most protocols support wildcards: ?
(question mark) matches one arbitrary character;
*
(asterisk) matches any number of arbitrary characters.
Note that wildcard support and their syntax is protocol-dependent.
Here we present some examples of possible URL for File Operations:
/path/filename.txt
One specified file.
/path1/filename1.txt;/path2/filename2.txt
Two specified files.
/path/filename?.txt
All files satisfying the mask.
/path/*
All files in the specified directory.
/path?/*.txt
All .txt
files in directories that satisfy the path?
mask.
ftp://username:password@server/path/filename.txt
Denotes path/filename.txt
file on
remote server connected via ftp protocol using username and
password.
If the initial working directory differs from the server root directory, please use absolute FTP paths, see below.
ftp://username:password@server/%2Fpath/filename.txt
Denotes /path/filename.txt
file on
remote server - the initial slash must be escaped as %2F
.
The path is absolute with respect to the server root directory.
ftp://username:password@server/dir/*.txt
Denotes all files satisfying the mask on remote server connected via ftp protocol using username and password.
sftp://username:password@server/path/filename.txt
Denotes filename.txt
file on
remote server connected via sftp protocol using username and
password.
sftp://username:password@server/path?/filename.txt
Denotes all files filename.txt
in directories satisfying the mask on
remote server connected via sftp protocol using username and
password.
http://server/path/filename.txt
Denotes filename.txt
file on
remote server connected via http protocol.
https://server/path/filename.txt
Denotes filename.txt
file on
remote server connected via https protocol.
s3://access_key_id:[email protected]/bucketname/path/filename.txt
Denotes path/filename.txt
object located in
Amazon S3 web storage service in bucket bucketname
.
The connection is established using the specified access key ID and secret access key.
hdfs://CONNECTION_ID/path/filename.txt
Denotes filename.txt
file on
Hadoop HDFS. The "CONNECTION_ID
" stands for
the ID of a Hadoop connection defined in the graph.
smb://domain%3Buser:password@server/path/filename.txt
Denotes a file located in a Windows share (Microsoft SMB/CIFS protocol). URL path may contain wildcards (both * and ? are supported).
The server
part may be a DNS name, an IP address or a NetBIOS name. Userinfo part of the URL
(domain%3Buser:password
) is not mandatory and any URL reserved character it contains should be
escaped using the %-encoding similarly as the semicolon ;
character with %3B
in the example (the semicolon is escaped because it collides with default Clover file URL separator).
The SMB protocol is implemented in the JCIFS library which may be configured using Java system properties. See Setting Client Properties in JCIFS documentation for list of all configurable properties.
A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like this:
sandbox://data/path/to/file/file.dat
where "data" is code for sandbox and "path/to/file/file.dat" is the path to the resource from the sandbox root. A graph does not have to run on the node which has local access to the resource.