Release 2.2.1

This release includes bug fixes and minor feature creep.

New Behaviors

1. SyQADA looks for a file called resources/SyQADA.config for administrative options. Currently, this only affects the sending of email when syqada dies in disgrace. Discussed in Installation, and documented in the released resources/SyQADA.config file.

2. A signal handler has been installed that attempts to catch Control-C gracefully and offer to exit. It works when there are local processes running, but ignores Control-C otherwise. This latter behavior is a consequence of python turning Control-C into a KeyboardInterrupt exception.

3. METADATA now includes the original source path of templates. This is a work in progress to document workflow pedigree and provide an empirical upgrade path for workflows.

Bug fixes

# Nested braces (with whitespace for awk/perl/etc) are now tolerated in templates.

# Replication now terminates with the first failed replicate rather than repeating a near-certain failure on others.

# syqada enqueue –working doesn’t barf if the WORKING directory already exists.

Release 2.2

This release includes significant changes from Release 2.1 in quality assurance specification, computational complexity expressions for time and memory allocation, and simplified interpretation of errors, as well as a tiny improvement in protocol nesting.

Significant New Behaviors

Quality Assurance Specifications

The specification qa can be added to any step to provide greater assurance that the output of the jobs is as expected. Quality assurance options include verifying that files are non-empty, that the number of lines, characters, and/or words in all outputs match. Default QA behavior preserves the original syqada rule that the number of outputs must match for each sample.

—noQA at the command line can be used to override QA.

See Quality Assurance for details.

Computational Complexity

Computation times and memory requirements naturally vary based on the size of data. The original one-size-fits-all jobestimate and gb_memory specifications remain acceptable, but Release 2.2 now includes a rudimentary Big-O syntax for specifying the amount of time or memory required for a (cluster) job based on several different expressions of data size: number of samples, number of lines in a file, file size, or, for jobs split by genomic region, length of chromosome.

See Expressing Computational Complexity in Job Estimates for details.

Error Interpretation

When SyQADA thinks it recognizes an error pattern, it provides help. The original help was so wordy that no-one, not even its creator, bothered to look at it, so it has been shortened to a one-liner that provides a reference to a help message displayable with the syqada help command. Matthew Giordano gets credit for this improvement.

See Error Interpretation for details.

Basic improvements

Protocol Nesting

nested protocol tasknames are now accessible to subsequent tasks by dotted reference for use in added_input. See Protocol Nesting.

syqada describe

syqada describe now displays a modest amount of greater detail

Revised Features tutorial

The tutorial formerly known as REPLICATION has been renamed as Features and reworked to include examples of the major special features.

syqada enqueue now supports a –working option

which will (re)run a designated script in a syqada working directory, placing the logs in a directory named WORKING

Bug Fix

Added region to {touch_output} template

In addition, the STUCK directory was eliminated to reduce confusion. You’re asking why the same thing has not occurred with the QUEUED directory.

Release 2.1

This release is the first one documented simply because it’s the first one that has a chance of being used outside the alpha test group. Release 2.1 makes the first attempt to fulfill the goal of making a SyQADA workflow completely reproducible. To the standard set of options, 2.1 adds the following important changes of behavior (see Improved Reproduciblity (New in 2.1) for full details).

Stringency of Workflow Reproducibility

The following options have been added to syqada:

--strictness [LAZY | PATH | PROGRAMPATHS | CACHING=[strict | ignore ] ] (default PROGRAMPATHS CACHING=strict)
--protocol_caching [strict | ignore | diffs | status | force ] (default strict)
--compatibility 2.0 (omitted by default to obtain current behavior)

The default behavior in 2.1 is:

--strictness PROGRAMPATHS,CACHING=strict

Under the defaults, some key environment variables are removed before job script execution. To obtain the prior less scrupulous behavior, use –compatibility 2.0 on the command line (see Improved Reproduciblity (New in 2.1) for full details).

Protocol Caching

The default behavior of syqada 2.1 is –protocol_caching strict, meaning that the files referenced by the protocol are cached locally (see Improved Reproduciblity (New in 2.1) for full details).

Local Specification of Queueing Interfaces

In 2.1, job queueing interface specifications are made more flexible by placing the key queueing parameters in definitions files in the resources directory. These can be edited for local installation with minimal risk to the syqada source itself. Several default files that correspond to MD Anderson’s clusters and the Scheet lab’s local usage are provided. More details are available at Der Gute Mensch von SechuaQADA.

New Command

A new command has been added:

>>> syqada enqueue [[--script | --bash_script] scriptname] [[-c | --command] command with arguments]

Using either scriptname or command with arguments, build and submit a jobscript to the cluster using the SyQADA queue_manager. The numerous SyQADA parameters used to populate scripts can be specified as command options.

[see syqada enqueue for full details]

New Features

Other new features of 2.1 are the following:

>>> syqada [auto | batch] --prototype
When running syqada auto or batch, –prototype creates all the jobscripts for the desired step and then runs only one of them, quitting without submitting the remainder if that job fails. If the job succeeds, syqada proceeds as usual and a normal bolus of the remaining jobs is submitted for execution.
>>> syqada auto --resume batch-step-prefix (e.g., --resume 09)
–resume assumes that all steps numerically prior to the resume step have finished correctly, and resumes automatic processing from that step, thus saving a great deal of logfile inspection in the case of large multi-step projects. My personal practice is to start with the step immediately before the one you would like to execute, just to make sure that it has actually succeeded.