GT 4.0 Index Service: How to Write a Simple Execution Aggregator Information Provider for MDS4

1. Introduction

This document is intended to be a starting guide to writing non web-service based information providers for the MDS4. It covers the concepts and walks through a simple example of how to get arbitrary information into the MDS4 using the Execution Aggregator Source. This Aggregator Source is used for gathering arbitrary XML information about a registered resource by executing an external script. This is mostly useful for scenarios where you would like to publish information into the MDS4 from a non web-service based information source. For web-service based information sources that export known Resource Properties, it is much easier to use the Query Aggregator Source. However, that source is outside the scope of this document.

This document covers writing a simple information provider that publishes fortune information at a regular interval into the MDS4's Index Service. This example was chosen because it is dynamic and simple, yet it illustrates all the fundamentals of this type of information provider.

2. Choosing (or conforming to) a Schema

The first step to getting information into the MDS4 is to decide which information you would like to have published. Since the data is in XML format, you should choose (or pick) the schema that you'd like the data to conform to. This generally means coming up with element names and types and creating some mapping of the data you're about to retrieve from your non web-service based application before putting it in to the MDS4. For this example, I'm going to choose this very simple format for the data:

<fortuneInformation>
   <fortuneData>
      ... here is the fortune ...
   </fortuneData>
   <fortuneDateAndTime>
      ... date and time of retrieval ...
   </fortuneDateAndTime>
   <fortuneSourceURL>
      ... the URL of where the fortune was retrieved ...
   </fortuneSourceURL>
</fortuneInformation>

As you can see, that format is very simple. An example output will look like this:

<fortuneInformation>
   <fortuneData>
     186,282 miles per second: It isn't just a good idea, it's the law!
   </fortuneData>
   <fortuneDateAndTime>
     Thu Jul 14 18:16:01 CDT 2005
   </fortuneDateAndTime>
   <fortuneSourceURL>
     http://anduin.eldar.org/cgi-bin/fortune.pl?text_format=yes
   </fortuneSourceURL>
</fortuneInformation>

Once you've chosen how to represent your data in XML format, you can start thinking about how you're going to retrieve and prepare that data for publication.

3. The Code

The second step to getting information into the MDS4 is to write a script (or program) that gathers and formats the appropriate data. This can be C code, shell script, perl code, etc, and it doesn't matter what kind of methods it uses behind the scenes, so long as it produces well formatted XML data.

For example, if we wanted to publish a fortune into the Index Service (using the free and charitable online service located at http://anduin.eldar.org/cgi-bin/fortune.pl), we could write a simple shell script to retrieve it and format it into our chosen XML schema.

You can sample the source code for this example implementation here. It is written as a bash shell script due to its simplicity. Tested platforms include GNU/Linux only. For this script to properly publish information, you must have one (or more) of the following programs installed on the system: wget, lynx, or fortune. All of these programs come standard with most GNU/Linux distributions, and it's important to note that only one of them is required (i.e. not ALL are required). [ NOTE: Windows users must have something like the cygwin operating environment for this to work ]

Download the code: fortune_script.sh.

This file should be saved in your $GLOBUS_LOCATION/libexec/aggrexec directory, although the reason will be explained in the next section.

4. Enabling The Provider

Now that we have the information provider written, the next step is to enable it so that we can test it. To do this you will need to do three things. First, come up with a short name (i.e. a mapping) that can be used to reference your provider, second, copy your provider to the location where it is expected to be found, and finally, register it to the Index Service with the parameters you'd like.

4.1. Establish mapping of your information provider

To establish the mapping of your provider, you need to edit the $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/jndi-config.xml file.

You should see an executableMappings section that looks something like this:

<parameter>
   <name>executableMappings</name>
      <value>
         aggr-test=aggregator-exec-test.sh,
         pingexec=example-ping-exec
      </value>
</parameter> 

To add our fortune_script.sh file, let's decide that we're call it the fortuneProvider as the mapped name. Our entry would then look like this:

fortuneProvider=fortune_script.sh

With that line added, the entire entry should look like this (note that an extra comma had to be added before our new entry):

<parameter>
   <name>executableMappings</name>
      <value>
         aggr-test=aggregator-exec-test.sh,
         pingexec=example-ping-exec,
         fortuneProvider=fortune_script.sh
      </value>
</parameter> 
[Note]Note

The reason we are required to establish this mapping in the first place is for security reasons. The execution aggregator source references this mapping when it's registered, rather than a full path name to a script to avoid allowing arbitrary registrations to be made that can execute arbitrary code. Requiring this mapping be configured before starting the globus container guarantees that the system administrator of the deployment has approved of the use of the new provider.

4.2. Copy information provider to correct location

To make sure your provider is in the expected place, it MUST be copied to the $GLOBUS_LOCATION/libexec/aggrexec directory. Notice how the full path of the script was not specified in the above example when making the mapping. That's because the path of $GLOBUS_LOCATION/libexec/aggrexec is simply assumed and it will be pre-pended at run-time for you. Make sure your file resides in this directory with proper executable permissions.

Check the listing to make sure:

neillm@glob ~ $ ls -al $GLOBUS_LOCATION/libexec/aggrexec/
total 12
drwxr-xr-x  2 neillm wheel 4096 Jul 16 14:01 .
drwxr-xr-x  6 neillm wheel 4096 Jul  8 14:52 ..
-rwxr-xr-x  1 neillm wheel  345 Jul  8 14:52 aggregator-exec-test.sh
-rwxr-xr-x  1 neillm wheel 1947 Jul 16 13:52 fortune_script.sh

4.3. Configure the registration file

So now that we've completed the first two steps of enabling the provider, we only have left to decide on the final details of how to make the registration to the Index Service.

To do this, you'll need a registration file. There are many types of registrations that can possibly occur, due to the flexibility of the Aggregator Framework. You can view several examples in the $GLOBUS_LOCATION/etc/globus_wsrf_mds_aggregator/example-aggregator-registration.xml file.

For this example, we'll simply use the custom fortune registration file provided , which is specific to the fortune provider we've made that uses the Execution Aggregator source. It's relatively simple, and the fields worth mentioning are shown here:

<defaultServiceGroupEPR>
   <wsa:Address>https://127.0.0.1:8443/wsrf/services/DefaultIndexService</wsa:Address>
</defaultServiceGroupEPR>

<defaultRegistrantEPR>
   <wsa:Address>https://127.0.0.1:8443/wsrf/services/fortuneProvider</wsa:Address>
</defaultRegistrantEPR>

These fields need to be updated to match how you'll be running your container. You'll need to properly address it, that is. For example, if you're running without security enabled on port 8080 and have an IP address of www.xxx.yyy.zzz, you should substitute the "https://127.0.0.1:8443" base part of the address with "http://www.xxx.yyy.zzz:8080".

Next, view or modify this section of the fortune-provider-registration.xml file:

<ServiceGroupRegistrationParameters
    xmlns="http://mds.globus.org/servicegroup/client" >

  <!-- Renew this registration every 600 seconds (10 minutes) -->
  <RefreshIntervalSecs>600</RefreshIntervalSecs>
  <Content xsi:type="agg:AggregatorContent"
           xmlns:agg="http://mds.globus.org/aggregator/types">
    <agg:AggregatorConfig xsi:type="agg:AggregatorConfig">
      <agg:ExecutionPollType>

        <!-- Run our script every 300,000 milliseconds (5 minutes) -->
        <agg:PollIntervalMillis>300000</agg:PollIntervalMillis>

        <!-- Specify our mapped ProbeName registered in the
             $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/jndi-config.xml
             file -->
        <agg:ProbeName>fortuneProvider</agg:ProbeName>

      </agg:ExecutionPollType>
    </agg:AggregatorConfig>
    <agg:AggregatorData/>
  </Content>
</ServiceGroupRegistrationParameters>

The relevant fields here that you can configure are the following:

RefreshIntervalSeconds - the amount of that time that should pass before the registration is renewed for you. 600 seconds (i.e. 10 minutes) is generally sufficient, and certainly is for this example. (Note: the mds-servicegroup-add utility will perform these registrations for you automatically at these time intervals). This parameter's unit is in seconds.

PollIntervalMillis - this is the time interval that we execute the specified provider. It's important to not set this value too low, as there's little value in having it execute extremely frequently given the overhead. For our example, we'll set it to 5 minutes (i.e. 300000 milliseconds). This means, the fortune information published in the Index Service will be updated once every 5 minutes. This parameter's unit is in milliseconds.

ProbeName - here is where the executable mapping is put to use. It must exactly match the (left-hand side) name you specified in the $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/jndi-config.xml. For this example, we chose this name to be fortuneProvider, and you can see that's what we've specified.

Download the example registration file, fortune-provider-registration.xml.

4.4. Register with Index Service: run mds-servicegroup-add

Finally, to make the registration of our provider to the Index Service, you should run the mds-servicegroup-add program in a similar mannner:

neillm@glob ~ $ $GLOBUS_LOCATION/bin/mds-servicegroup-add -s \
https://127.0.0.1:8443/wsrf/services/DefaultIndexService \
fortune-provider-registration.xml

Processing configuration file...
Processed 1 registration entries
Successfully registered
https://127.0.0.1:8443/wsrf/services/fortuneProvider to servicegroup at
https://127.0.0.1:8443/wsrf/services/DefaultIndexService

Note that you will have to specify the proper URI location of your Index Service on the command line and not the one specified above (unless it's the same, of course).

5. An Example Query

neillm@glob bin $ ./wsrf-query -s \
https://127.0.0.1:8443/wsrf/services/DefaultIndexService \
"//*[local-name()='fortuneInformation']"

<fortuneInformation xmlns="">
<fortuneData>
They told me you had proven it When they discovered our results About
a month before. Their hair began to curl The proof was valid, more or
less Instead of understanding it But rather less than more. We'd run
the thing through PRL. He sent them word that we would try Don't tell
a soul about all this To pass where they had failed For it must ever
be And after we were done, to them A secret, kept from all the rest
The new proof would be mailed. Between yourself and me. My notion was
to start again Ignoring all they'd done We quickly turned it into code
To see if it would run.
</fortuneData>
<fortuneDateAndTime>
Wed Jul 20 12:36:36 BST 2005
</fortuneDateAndTime>
<fortuneSourceURL>
http://anduin.eldar.org/cgi-bin/fortune.pl?text_format=yes
</fortuneSourceURL>
</fortuneInformation>

This segment of the query output represents the fortune data we've just written and configured for use. As you can see the fortuneInformation block was properly published into the Index Service since it's now been properly configured and registered!

6. Contact the author

Contact the author at [email protected].