ProActive File Transfer

21.1. Introduction and Concepts

Currently ProActive provide support for the following type of transfers:

  • To a remote node (Push)

  • From a remote node (Pull)

The transfer can take place at any of the following moments:

  • Deployment Time: At the beginning of the application to input the data.

  • Retrieval Time: At the end of the application to collect results.

  • During the user application: To transfer information between nodes.

To achieve this, we have implemented File Transfer support in two ways:

  • File Transfer API

  • Descriptor File Transfer support.

21.2. File Transfer API

21.2.1. API Definition

package org.objectweb.proactive.api.ProFileTransfer;

static       RemoteFile pull(Node srcNode, File srcFile, File dstFile) throws IOException;
static List<RemoteFile> pull(Node srcNode, File[] srcFile, File[] dstFile) throws IOException;

static       RemoteFile push(File srcFile, Node dstNode, File dstFile) throws IOException;
static List<RemoteFile> push(File[] srcFile, Node dstNode, File[] dstFile) throws IOException;

static       RemoteFile transfer(Node srcNode, File srcFile, Node dstNode, File dstFile) throws
 IOException;
static List<RemoteFile> transfer(Node srcNode, File[] srcFile, Node dstNode, File[] dstFile)
 throws IOException;
static List<RemoteFile> transfer(Node srcNode, File[] srcFile, Node dstNode, File[] dstFile, int
 bsize, int numFlyingBlocks) throws IOException;

static       RemoteFile mkdirs(Node node, File file);

These methods are static, and handle the transfer of file between ProActive Nodes. The pull methods retrieve a file/directory located on a remote machine to the local machine. The push methods transfer a file/directory available on the local node to the specified remote node. The transfer methods allow transfer between third party nodes. The mkdirs method creates a directory on the remote machine.

The file transfers are performed in an asynchronous fashion. Each of these methods returns at RemoteFile object which represents the file transfer operation and the remote file's node/location. When these methods are invoked a RemoteFile instance is immediately returned, before the file transfer operation has been completed. The RemoteFile object provides a way to monitor the status of the file transfer:

public interface RemoteFile extends Serializable {

  public boolean isFinished();

  public void waitFor() throws IOException;

  public RemoteFile pull(File localDst) throws IOException;

  public RemoteFile push(Node dstNode, File dstRemote) throws IOException;

  public Node getRemoteNode();

  public File getRemoteFilePath();

  public boolean delete() throws IOException;
    
  public boolean exists() throws IOException;
    
  public boolean isDirectory() throws IOException;
    
  public boolean isFile() throws IOException;
}

The isFinished and waitFor can be invoked to query and wait on the file transfer status. The pull method can be used to fetch the RemoteFile from the remote node, into the local Node, and the push methods can be used to send the RemoteFile to another Node.

21.2.2. How to use the API Example

In the following example, a Node is deployed using a descriptor file. A File is then pushed from the local Node to a remote Node. Then the file is pulled from the Remote Node, and the computation is blocked waiting for the file transfer operation to finish.

ProActiveDescriptor pad = PADeployment.getProactiveDescriptor(XML_LOCATION);

VirtualNode testVNode = pad.getVirtualNode("test");
testVNode.activate();
Node testnode = testVNode.getNode();

RemoteFile remoteFileA = ProFileTransfer.push(new File("/tmp/test.dat"), testnode, new File(
"/tmp/test.dat"));
RemoteFile remoteFileB = remoteFileA.pull(new File("/tmp/test2.dat"));


remoteFileB.waitFor(); //blocking method

21.2.2.1. How to obtain a Node from an Active Object reference

The Node where an Active Object resides can be obtained in the following way:

Object o  =  PAActiveObject.newActive(...);
...
Node node = PAActiveObject.getActiveObjectNode(o);

21.3. Descriptor File Transfer

File Transfers can also be specified using ProActive Descriptors. The main advantage of this scheme is that it allows deployment and retrieval of input and output (files). In this section we will concentrate on mainly three topics:

  • XML Descriptor File Transfer Tags

  • Deployment File Transfer

  • Retrieval File Transfer

21.3.1. XML Descriptor File Transfer Tags

The File Transfer related tags, are placed inside the descriptor at three different parts (or levels).

The first one corresponds to the fileTransferDefinitions tag, which contains a list of FileTransfer definitions. A FileTransfer definition is a high level representation of the File Transfer, containing mainly the file names. It is created in such a way, that no low level information such as: hosts, protocols, prefix is present (this is the role of the low level representation). The following example shows a FileTranfer definition named example:

....
</deployment>
<fileTransferDefinitions>
   <fileTransfer id="example">
      <file src="hello.dat" dest="world.dat"/>
      <file src="hello.jar" dest="world.jar"/>
      <file src="hello.class" dest="world.class"/>
      <dir src="exampledir" dest="exampledir"/>
  </fileTransfer>
  <fileTransfer id="anotherExample">
      ...
  </fileTransfer>
  ...
</fileTransferDefinitions>
<infrastructure>
....         

The FileTransfer definitions can be referenced through their names, from the VirtualNode tags using two attributes:fileTransferDeploy and fileTransferRetrieve. The first one, corresponds to the file transfer that will take place at deployment time, and the second one corresponds to the file transfer that the user will trigger once the user application is done.

<virtualNode name="exampleVNode" fileTransferDeploy="example" fileTransferRetrieve="example"/>

All the low level information such as: hosts, username, protocols, prefix, etc... is declared inside each process. Both fileTransferDeploy and fileTransferRetrieve are specified separetly using a refid attribute. The refid can be a direct reference to a FileTransfer definition, or set using the keyword implicit. If implicit is used, then the reference will be inherited from the corresponding VirtualNode. In the following example both mechanisms (Deploy and Retrieve) reference indirectly and directly the example definition:

<processDefinition id="xyz">
  <sshProcess>
  ...  
<!-- Inside the process, the FileTransfer tag becomes an element instead of
an attribute.  This happens because FileTransfer information is process specific.
Note that the destination hostname and username can be omitted,
and implicitly inferred from the process information. -->

    <fileTransferDeploy refid="implicit"> <!-- referenceID or keyword "implicit" (inherit)-->
      <copyProtocol>processDefault, rcp, scp, pft</copyProtocol>
      <sourceInfo prefix="/home/user"/>
      <destinationInfo prefix="/tmp" hostname="foo.org" username="smith" />
    </fileTransferDeploy>

    <fileTransferRetrieve refid="example">
      <sourceInfo prefix="/tmp"/>
      <destinationInfo prefix="/home/user"/>
    </fileTransferRetrieve>
  </sshProcess>
</processDefinition>

In the example above, fileTransferDeploy has an implicit refid. This means that the File Transfer definitions used will be inherited from the VirtualNode. The first element shown inside this tag corresponds to copyProtocol. The copyProtocol tag specified the sequence of protocols that will be executed to achieve the FileTransfer at deployment time. Notice the processDefault keyword, which specifies the usage of the default copy protocol associated with this process. In the case of the example, this corresponds to an sshProcess and therefore the Secure Copy Protocol (scp) will be tried first. To complement the higher level File Transfer definition, other information can be specified as attributes in the sourceInfo and destinationInfo elements. For the case of FileTransferDeploy, these tags currently correspond to: prefix, hostname and username.

For fileTransferRetrieve, no copyProtocol needs to be specified. ProActive will use it's internal mechanism to transfer the files. This implies that no hostname or username are required.

21.3.1.1. Currently supported protocols for file transfer deployment

  • pftp (ProActive File Transfer Protocol)

  • scp (ssh processDefault)

  • rcp (rsh processDefault)

  • unicore (Unicore processDefault)

  • nordugrid (Nordugrid processDefault)

21.3.1.2. Triggering File Transfer Deploy

The trigger (start) of the File Transfer will take place when the deployment of the descriptor file is executed. In the case of external protocols (scp, rcp), this will take place before the process deployment. In the case of internal protocols (unicore, nordugrid), this will take place with the process deployment. In any case, it should be noted that intersting things can be achieved, such as transfering the ProActive libraries into the deploying machine using an on-the-fly style. This means that it is possible to deploy on remote machines without having ProActive pre-installed. Even further, when the network allows, it is also possible to transfer other required libraries like the JRE (Java Runtime Envirorment).

There is one protocol that behaves differently from the rest, the ProActive FileTransfer Protocol (pftp). The pftp uses the ProActive FileTranfer API (described earier), to transfer files between nodes. The main advantage of using the pftp is that no external copy protocols are required to transfer files at deployment time. Therefore, if the grid infrastructure does not provide a way to transfer files, a FileTransfer Deploy can still take place using the pftp. On the other hand, the main drawback of using pftp is that ProActive must already be install on the remote machines, and thus on-the-fly deployment is not possible.

21.3.1.3. Triggering File Transfer Retrieve

Since distributed application's termination is difficult to detect. The responsability of triggering the deployment corresponds to the user. To achieve this, we have provided a specific mehod that will trigger the retrieval of all files associated with a VirtualNode.

import org.objectweb.proactive.core.descriptor.data;

public FileWrapper VirtualNode.fileTransferRetrieve();

This will trigger the retrieval of all the files specified in the descriptor, from all the nodes that were deployed using this virtual node using the pftp. The following shows an example:

import org.objectweb.proactive.core.descriptor.data;

pad = PADeployment.getProactiveDescriptor(XML_LOCATION);

VirtualNode testVNode = pad.getVirtualNode("example");
testVNode.activate();
Node[] examplenode = testVNode.getNodes();

...

FileWrapper fw = testVNode.fileTransferRetrieve();
...
File f[]=fw.getFiles() //wait-for-files to arrive

As a result of calling this method an array of type File[] will be created, representing all the retrieved files.

21.4. Advanced: FileTransfer Design

This section provides internal details and information on how the File Transfer is implemented. Reading the following section to use the File Transfer mechanisms provided by ProActive is not necessary.

21.4.1. Abstract Definition (High level)

This definitions can be referenced from a VirtualNode. They contain the most basic information of a FileTransfer:

  • A unique definition identification name.

  • Files: source and optionally the destination name.

  • Directories: source and optionally the destination name. Also the exclude and include patterns (not yet available feature).

References from the VirtualNode are made using the unique definition name.

21.4.2. Concrete Definition (Low level)

These definitions contain more architecture specific information, and are therefore contained within the Process:

  • A reference to an abstract definition, or the "implicit" key word indicating the reference will be inherited from the VirtualNode.

  • A sequence of Copy Protocols that will be used.

  • Source and Destination information: prefix, username, hostname, file separator, etc...

If some of this information (like username or hostname) can be inferred from the process, it is not necessary to declare it in the definition. Optionally, the information contained in the protocol can be overridden if specified.

21.4.3. How Deployment File Transfer Works

File Transfer Design

Figure 21.1. File Transfer Design


When a FileTransfer starts, both abstract and concrete information are merged using the FileTransfer Workshop. The result of this process correspons to a sequence of CopyProtocols, as specified in the Concrete Definition.

Each CopyProtocol will be tried before the deployment takes place, until one succeeds. After one succeed are all fail, the process deployment will take place.

21.4.4. How File Transfer API Works

The File Transfer API is built on top of ProActive's active object and future file asynchronism model. When pulling or pushing a file from a Node, two service Active Objects (AO) are created. One is placed on the local machine and the otherone on the remote site. The file is then split into blocks, and transfered over the network using remote invocations between these two AO.

21.4.5. How Retrieve File Transfer Works

For a given virtualnode, a File Transfer pull will take place with all the nodes deployed from this virtualnode. The detailes of the specified file transfer will correspond to the ones present in the descriptor file.