GT 4.0.2 Incremental Release Notes: WS GRAM

1. Introduction

These release notes are for the incremental release 4.0.2. It includes a summary of changes since 4.0.1, bug fixes since 4.0.1 and any known problems that still exist at the time of the 4.0.2 release. This page is in addition to the top-level 4.0.2 release notes at http://www.globus.org/toolkit/releasenotes/4.0.2.

For release notes about 4.0 (including feature summary, technology dependencies, etc) go to the WS GRAM 4.0 Release Notes.

2. Changes Summary

Overall, WS GRAM in 4.0.2 is much improved over 4.0.1.

A focused (and ongoing) effort for large job loads to complete reliably has helped to identify a number of improvements. See Campaign 4197 for the complete details and note the 6 bugs marked as dependencies to this bug. One of the 6 improvements made (Enh 4330) was implementing a better algorithm for processing jobs from the internal job run queue. This improved responsiveness significantly. During a large throughput run to a 4.0.1 WS GRAM service, a separate simple /bin/date Fork job took > 10 minutes to return. In 4.0.2, the same Fork job was processed in ~2 minutes.

The WS GRAM testing infrastructure has been improved with automated throughput testing - http://skynet-login.isi.edu/gram-testing/. Nightly, the WS GRAM code is checked out of various CVS branches, built, and throughput tests are run. This has helped identify bugs more quickly and ease the effort of resolving bugs that can be difficult to reproduce. For a good example, see Bug 4235. The WS GRAM service is not released until these tests are passing consistently from the CVS release branch. This has helped us provide an overall better WS GRAM service.

There were 38 bug fixes since 4.0.1; here are a few worth highlighting:

  • Improvements were made to help identify service installation/configuration problems between WS GRAM and the local Resource Manager by improving the SEG reporting on fatal errors. Bug 4229
  • The variable "SCRATCH_DIRECTORY" was not being set in the job's environment. Bug 4192
  • WS GRAM was not failing the job on some fatal error conditions. Bug 4247 Bug 4279 Bug 3631
  • Job error reporting was improved. Bug 4273 Bug 4241
  • For the (default) INFO logging, WS GRAM will produce entry and exit logging for a job. Bug 3742

3. Bug Fixes

The following bugs were fixed for WS GRAM:

  • Bug #3190: Staging jobs are not recoverable
  • Bug #3631: Delegation fetching errors should cause the job to fail i...
  • Bug #3642: Incorrect canceling of condor jobs
  • Bug #3738: self auth is applied to globusrun-ws, but host auth is st...
  • Bug #3698: user not allowed to override GLOBUS_LOCATION
  • Bug #3002: sudo path isn't set correctly in libexec/globus-sh-tools-...
  • Bug #3812: Absolute Windows paths not registering as absolute paths.
  • Bug #3772: Problem parsing certain PBS scheduler logs
  • Bug #3777: PBS SEG module can stop prematurely on PBS restart
  • Bug #3458: globusrun-ws isn't automatically releasing holds
  • Bug #3966: PBS SEG not refreshing
  • Bug #3931: Failed to retrieve Resource Properties from ManagedJobSer...
  • Bug #3935: GRAM setup scripts must be run from $GLOBUS_LOCATION/setu...
  • Bug #4187: AuthorizationHelper broken on container restart
  • Bug #3185: pbs job manager exit code
  • Bug #4229: Improve SEG fatal Error logging
  • Bug #2266: SEG needs to communicate errors better
  • Bug #3757: Remote exit status 0 ambiguity
  • Bug #4247: JSM registration error not failing job
  • Bug #4192: Missing SCRATCH_DIRECTORY env var.
  • Bug #4279: File mapping errors should fail the job.
  • Bug #4273: globusrun-ws error reports are ugly
  • Bug #4253: unable to monitor job for state changes
  • Bug #4297: PBS SEG module does not follow torque log rotation
  • Bug #4198: Bad reg-ex in condor poll script
  • Bug #4327: Script crashing with "Terminated"
  • Bug #4329: Missing context upon job recovery.
  • Bug #4330: Job Run Queue Processing Algorithm Inefficient
  • Bug #4331: Jobs disappear from state machine under heavy loads.
  • Bug #2397: optimize rft request if running in same container
  • Bug #4241: Allow for multi-line error from scheduler commands
  • Bug #4170: LoadLeveler 3.3.1 includes a GT4 GRAM Scheduler Adapter -...
  • Bug #3770: Incorrect GPT metadata breaks VDT builds
  • Bug #3687: %ENV mispelled %env in globus-job-manager-script.pl
  • Bug #3699: pbs scheduler event generator must be started after logs ...
  • Bug #1039: jobmanager-condor doesn't support the java universe
  • Bug #4159: Improve container error message when SEG execution fails
  • Bug #4111: locking / synch problems
  • Bug #3742: GRAM should log entry/exit of jobs to INFO level
  • Bug #4339: Condor adapter's poll subroutine not working for count > 1

4. Known Problems

The following problems are known to exist for WS GRAM at the time of the 4.0.1 release:

  • Bug #1562:Condor jobs fail when running globus-sh-exec job
  • Bug #2049:Batch providers need a namespace
  • Bug #3571:ant not found during install
  • Bug #3575:SEG dependent on GLOBUS_LOCATION env var
  • Bug #3579:takes over 4 seconds to start a single job with 4.0.1 bra...
  • Bug #3778:WS-GRAM dependant on GT2 job manager
  • Bug #2527:Add a failure case for non-existant queue to scheduler te...
  • Bug #2286:internationalization
  • Bug #3495:queue information, job count not reported to MDS
  • Bug #3726:GlobusRun error message typo
  • Bug #3844:Bad GPT metadata for globus_wsrf_gram_client_tools-1.0
  • Bug #3803:Default scratchDirectory doesn't exist
  • Bug #3384:Inconsistent jobType/count parameter semantics
  • Bug #3892:Out of date performance data?
  • Bug #3672:Streaming with PBS fails
  • Bug #3897:Must modify Globus in order to use authorization callouts
  • Bug #3911:Bug in GRAM documentation for Condor log file
  • Bug #3746:bad link on scheduler tutorial
  • Bug #4116:globus_module_activate( GLOBUS_GRAM_CLIENT_MODULE ) behav...
  • Bug #3675:wsrf gram client file streaming error
  • Bug #4153:gram scheduler test failing - submit202
  • Bug #4161:Managed Job Types, holding, state, userSubject and exitCo...
  • Bug #4162:fill in LINK TO SEG API Doc
  • Bug #4181:Allow File Staging To/From globusrun-ws application witho...
  • Bug #4182:Improve Condor/Fork Job Monitoring for reliability and se...
  • Bug #4178:no job output when streaming with globusrun-ws
  • Bug #4191:globusrun-ws job submission hangs
  • Bug #3910:Bad permissions on condor log file prevents job submissions
  • Bug #3529:setup/postinstall fatal errors should be warnings
  • Bug #4216:Empty submit_test/submitxxx.err causes FAILure in Local t...
  • Bug #4275:globus-gram-local-proxy-tool fails on Solaris
  • Bug #4388:globusrun-ws handles -self differently than java clients
  • Bug #3748:WS-GRAM Plugable Resource Manager Backend
  • Bug #3751:convert persistence data store from files to a database
  • Bug #4207:Enabling dynamic job description variables using softenv
  • Bug #4319:globus_scheduler_event_generator doesn't build with -stat...
  • Bug #3948:Service must release all of its resources on deactivation
  • Bug #4321:globusrun-ws freezes with "-job-delegate" option

5. For More Information

Click here for more information about this component.