Developer Documentation Only Below This Point:

Batch management package for PBS, extended for use on a local machine or on LSF

There are two little naming bugbears in here to be aware of.

First, LOGFILE is a variable that affects where the batch processing log goes, but logfile() is a method that identifies where the standard output and error (and any specifically-written logfiles) of a given job execution log goes.

Second, “queue” refers to three things, the queue of PENDING jobs in the batch, the job processing management software, and the named queue supported by the processing management software. The first is maintained by a directory named PENDING, and referenced by pendingcount(). The third is identified by qm.QUEUESPEC. The second is implemented as an interface to either a LOCAL or PBS queue manager. This was originally called an interface manager, which is more appropriate, but it was too hard to write the code without using the term queue, because “PBS queue” is how it is always described in conversation. This is somewhat less frustrating than trying to get a non-systems person to say “workflow” instead of “pipeline,” but not ultimately worth the hassle, since even the author had difficulty with using the term “interface” consistently, in code or in conversation.


whether to re-iterate on incomplete batch

class JobBatch.JobBatch(metadata, resume=None)

Manager for a collection of jobs provided as shell scripts.

Typical invocation is through, but this does execute from a command line.

qmanager The name of a queue manager implementation or an instance thereof.

batchroot The directory to be used for batch management.

jobestimate The length of time expected to complete a typical run of the given shell script, default 4 hours. This is expressed in the form DnnHnnMnn for days, hours, minutes.

processors Number of processors desired, overridden by gb_memory below. Default 3.

gb_memory Amount of memory required. Successful PBS queueing depends on this. Default 4G.

queue Desired queue. Overridden as need be by queue_manager. DEPRECATED, but it turns out to have its uses if the PBS queue policies change.

logfile Where to write log. Default <batchroot>/jobbatch.log


Return the jobs not pending, done, or failed.

batching_loop(max_bolus, parallel=False, prototyping=False)

Main driver loop.

As long as there are pending jobs, feed the next bolus (which may be the whole batch in many instances) into the queue and then loop waiting for progress. As jobs finish, any remaining jobs are fed into the queue to maintain a full queue.

parallel => return a continuing status so that other batches in a replicate may start

prototype => run a single job and await its success before running the entire bolus


Return the number of jobs currently in the queue (directory) named q


Return the number of jobs that are DONE


Return a list of jobs that are DONE


Return the number of jobs that have encountered an error


Return a list of jobs that have encountered an ERROR


Determine and log the state of each running job. Return the count of jobs that have changed state since the last check.

feed_batch(max_bolus=0, delta=0, prototyping=False)

Log the start of the JobBatch and then submit some or all jobs in PENDING

max_bolus allows a user to force a different injection rate of jobs into the queue than the default behavior would permit.

max_bolus of 1 effectively causes sequential behavior

delta is the number that have recently finished


Return True if all jobs in the batch are completed, either with or without error; False if not.


Return True if all jobs in the batch finished without error


Return the list of jobs currently in the queue (directory) named q The isdir() allows us to count all states including some that may not exist yet, such as STUCK.


Standard path to the log file.

pend_job(command, rename=None)

Add a script to the queue and a name for the job.


Return the number of jobs with the status of “PENDING”


Return a list of jobs with the status of “PENDING”

prepare_batch(args, jobcount=0)

For an initialized batch, select appropriate queue and number of processors

This WAS horribly broken, because of the timing of writing the jobscript and this subsequent submission. I think it is fixed, but beware.


Return a list of jobs with the status of “QUEUED”

record_job_change(jobid, todir, fromdir=None)

Move jobid from a directory representing one state to another.

record_job_completion(jobid, fromdir=None)

Change state and log the timestamp of successful completion of the job aliased jobid

record_job_error(jobid, fromdir=None)

Change state and log the timestamp of failure of the job aliased jobid


Change state and log the timestamp of aliased jobid getting queued (not in use)

record_job_start(pending, jobid)

Change state of the job aliased jobid to qm.STATUS_RUNNING


Change state and log the timestamp of aliased jobid getting stuck


For a batch that has run, find jobs that have moved to ERROR and rename them back to PENDING without their jobid suffixes


For a batch that has run, find jobs that have moved to RUNNING and are no longer running; move them to DONE if done, otherwise move them back to PENDING without their jobid suffixes.


Set the jobcounter to the number of jobs in the pending queue.


Return the number of jobs with the status of “RUNNING”


Return a list of jobs with the status of “RUNNING”


Return a list of jobs with the status of “STUCK”


Make sure the command exists. Have the queue manager start the job if so. Return the job ID of the job or raise a SyQADAWarning (from manager.submit())