Command usage:

SyQADA expects the directory in which it is invoked to be the working directory of a project, not the bin directory in which the executable sits. Exceptions to this are syqada –help, syqada –version, and syqada manual, which will bring up the manual in a browser regardless of your current working directory.

SyQADA is constructed as a collection of modules called by a single driver. The following sub-commands may be used as the first argument to the syqada command. This first list of sub-commands does not require a batch directory as an argument.

  • auto
  • status
  • init
  • manual
  • validate
  • prototype
  • –version
  • –help

These sub-commands, used as the first argument to syqada, require the name of a batch directory as the second argument. The command tool is a synonym for manage.

  • manage
  • batch
  • tool

A sample usage illustrating a typical batch directory name:

syqada batch 01-step-one --step init

The following sub-commands are simply syntactic sugar to simplify the invocation of the batch, manage, and tool sub-commands. They, too, require the name of a batch directory as the second argument. They are described at the tail end of this page at The Syntactic sugar Commands.

  • repend
  • reset
  • purge
  • errors
  • fix
  • stderr
  • stdout

In both lists, each subcommand takes additional parameters identified by two leading dashes. distinct prefixes of the parameter names are acceptable, for example:

syqada batch 01-step-one --step init

and:

syqada batch 01-step-one --step init

have identical effect.

both syqada batch and syqada manage operate on a single batchroot, whereas syqada auto evaluates all steps in a protocol, effectively invoking the components of manage, batch and then manage again, on each batchroot in turn.

syqada manual

>>> syqada manual [page-prefix]

will display the manual in your chosen web browser. syqada manual by itself displays the first page. Providing a page name or the case-insensitive prefix of a page name will display that page. You will be prompted for a selection if the prefix matches multiple pages. For example:

>>> syqada manual comm

will display this page; also:

>>> syqada manual help

will display a list of all pages from which to select.

syqada begin

syqada will make a feeble attempt at initial preparations for a project if you invoke:

>>> syqada begin

which will show you a list of the protocols currently available. If you select one of these by number, SyQADA will then create a control directory, ask you for a project name, and create the three project files you need to run the workflow. You will probably need to edit all three files. For most protocols, the config file will include all the parameters including a sourcedata specification that you need to specify for a successful run. you will also need to edit the sample file to contain a list of your samples (and optional phenotypic attributes). Further description of these three files is found below under syqada auto.

syqada validate

>>> syqada validate filename [--config configfile]

This command checks a template, task, or protocol file for complete definition of parameters and either declares it OK or identifies what the problems are. It can be especially useful during the development of a workflow, but also during a new deployment of an existing workflow.

syqada auto

SyQADA expects to work in the directory in which it is invoked, not the bin directory in which the executable sits. By convention, control and configuration files are stored in a subdirectory named control. This convention is codified, in that if you place the configuration, sample, and protocol files in the control directory and name them:

MY_PROJECT.config
MY_PROJECT.samples
MY_PROJECT.protocol

you may then invoke SyQADA simply as:

>>> syqada auto

MY_PROJECT.protocol Should either be a file containing a series of tasks and their attribute information (see Protocol Construction, or see workflows/example/control/Example.protocol for an example) or a list of task-names and their references (see workflows/example/control/Example.reference for an example).

MY_PROJECT.config This should contain absolute path references to executables and reference files (such as hg19.fasta). They are expressed as colon separated entries, one per line, such as:

sourcedata        : /a/directory/containing/sample/data

The config file reader tolerates shell environment expressions, but they should be used with caution. I make use of $TEAM_ROOT because that is relatively well fixed and meant to be as consistent across systems as feasible without a dedicated engineering team. Even that can lead you astray in the wrong environment of course, so be warned.

MY_PROJECT.samples Should contain a list of sample names that will be used in filenames, etc., one name to a line. If the first line is the header of a tab-delimited file, then the sample file can also contain phenotype information that can be used as input parameters to task templates (see The Sample File).

—notifications <path> The name of a file where progress notations will be written. syqada auto defaults this to be a path to a file on a webserver running on one of the HAPS. The eventual goal would be to create an RSS feed. In the meantime, this writes the most recent event at the top of the file so that a web retrieval will not need to scroll to see current state.

syqada auto reads the protocol file and then figures out what to do. If no task directories exist, it creates them and their METADATA files If –init is specified, it stops after creating them.

Without –init, syqada auto then proceeds to evaluate each directory in turn from 01 on, running a BatchRunner if the task has not completed. Upon completion of the BatchRunner, SyQADA tests the task for completion and proceeds to the next task if there are no errors.

At this time, syqada auto does nothing to repair a task that shows a failure. However, it does do some simple parsing of stderr to identify common problems and give hints as to their possible causes. Failed tasks must be handled manually with syqada manage and syqada batch. See those sub-commands for details.

syqada auto has three special and mutually exclusive flags.

  • init
  • ignore
  • status

Typical invocation does not require any flags. The meaning of the options follows.

syqada auto –init

simply constructs any unconstructed METADATA for tasks in the protocol file. It is typically followed by syqada batch such-and-such –step init, so that the user can determine manually whether the job scripts for a batch have been correctly created before running syqada auto.

syqada auto –ignore batchdir-prefix1 [batchdir-prefix2 ...]

will ignore errors in steps beginning with any prefix in the list. Note that –ignore 0 will ignore errors in any batch except for an obscenely long protocol.

syqada status

runs syqada manage on each step of the protocol in turn to report the current status of the workflow. This is obviously a useful tool for nondestructively determining the state of an existing workflow whose recent history is unfamiliar to you.

syqada init

Will initialize the workflow (the equivalent of syqada auto –init), and upon success, will initialize and create the jobs for the first batch that needs to be run (usually the first!) using syqada batch batchdir –step init. OK, this actually doesn’t work yet, but it would save a bunch of command typing if it did. Check back to see if it’s been added yet.

syqada batch

syqada batch requires two arguments, a batch directory and –step, which has three options.

  • init
  • repend
  • rerun (I don’t know if we’re keeping this or folding it into repend)
  • run

init and repend are mutually exclusive. run can be invoked alone or with one of the other two.

Most typical usages of syqada batch have been replaced by running syqada auto and letting it figure out what to do. There are two special cases detailed here. When a failure has occurred, and you wish to reset the batch to re-run.:

>>> syqada batch *batchroot* --step repend [run]

This cleans up the logs of the failed jobs in the given task directory and restores the failed scripts to the PENDING directory so the batch can be resumed. Run is optional, because you might choose to use syqada auto after re-pending the failed jobs. Examples can be found in Troubleshooting Guide or the Preface to the Tutorials.

The second usage is to run a batch whose memory and processor configuration would run more jobs at once than you wish to, if, say, you are trying to share the machine with a team member working a project with a shorter deadline, or if you have a batch whose success depends on serial execution of the jobs (variant_tools sample import, for instance, seems safest run in serial mode).

>>> syqada batch *batchroot* --max_bolus N

will constrain SyQADA to run no more than N jobs at once.

syqada manage

>>> syqada manage *batchroot*

runs batch_tool, which will show the current state of the batch in the given task directory, including any batch management problems. a batch management problem is a circumstance usually caused by premature termination of SyQADA while it is managing the execution of a batch. in this case, the progress of jobs currently in the pbs queue obviously cannot be recorded, and repair work will become necessary.

if there are batch management problems, they may be shown in detail by adding the –detail option with one or more of the additional arguments, done, failed, or output.

if batch management problems do exist, they can be fixed by adding the –fix option with one or more of the same additional arguments.

note that jobs left in running (or queued or stuck) that are terminated forcibly after the SyQADA batch manager has quit running will not be recognized by syqada manage and must be moved by hand. since these jobs need to be re-run, the tactic I use is:

mv *batchroot*/running/* *batchroot*/error
syqada batch *batchroot* --step repend

moving them to batchroot/error and then using –step repend instead of simply moving them to batchroot/pending allows SyQADA to trim off the old process id, which would otherwise confuse SyQADA upon restart.

Examples of output

A batch that has completed successfully will produce results that look like:

> syqada manage  task-directory/
0.9.9

checking control directories... ............................................
checking logs... ...........................................................
syqada-0.9.9: task 0102-varscan
jobs 88, queues  pending 0,  running 0,  done 88,  error 0
                 ,           ,  begun 88,  done 88,  failed 0, outputs 88
88 of 88 required jobs completed.
batch completed

Obviously, there are other conditions. a batch in progress will produce results that look like:

> syqada manage task-directory/
0.9.9

checking control directories... ............................................
checking logs... .......
syqada-0.9.9: task 0802-varscan
jobs 88, queues  pending 81,  running 0,  done 7,  error 0
               ,           ,  begun 7,  done 7,  failed 0, outputs 7
batch can resume

a batch that has failed will produce results that look like::

> syqada manage 01-phase-samples
0.9.8.3

Checking control directories... .........................................
.........................................................................
Checking logs... .......................................
syqada-0.9.8.3: Task 01-phase-samples
Jobs 2948, Queues  PENDING 2615,  RUNNING 0,  DONE 0,  ERROR 333
                 ,           ,  begun 333,  done 0,  failed 333, outputs 0
Batch in error

in certain cases, because of the timing of SyQADA managing the completing jobs, you may see a message saying “batch needs curation” with a description of discrepancies. this is harmless as long as syqada (auto or batch) is still running. if SyQADA has terminated, and you see the message “batch needs curation,” you can run with the –details parameter to get more information. for example:

>  syqada manage task-directory/ --details done

to show you exactly which jobs marked done have not been properly managed. done is not the only option. other options to the detail parameter are explained on the batch_tool.py page.

to curate the batch after syqada batch has terminated, you can run with the –fix parameter. for example:

>  syqada manage task-directory/ --fix done failed output

this will curate all jobs that:

were still in state running but had indicated that they had completed (done)
were still in state running but had indicated that they had had an error (failed)
were in state done but did not have the same number of outputs as other completed jobs (output)

syqada status

The commands:

syqada status

and:

syqada auto --status

are identical, and are the terser equivalent of running:

for batch in 0* ; do
  syqada manage $batch
done

An example from the test suite follows:

 syqada status --protocol control/Example.reference
Inferring project name Example from protocol file
Protocol.INFO: Opening protocol version 1.0 file control/Example.reference
Protocol.INFO: Found 4 tasks
syqada-1.0-RC2: Task 01-count-characters 11 of 11 required jobs completed. Batch completed
syqada-1.0-RC2: Task 02-demonstrate-failure-handling 10 of 11 required jobs completed. Batch in error
H00:00:00.018 02-demonstrate-failure-handling: Batch in error

syqada errors

>>> syqada manage *batchroot* [--errors [arguments] | --reset | --purge ]

The manage command provides some useful debugging and control. As usual with SyQADA, this is merely syntactic sugar on tasks you could accomplish with command line tools in the sub-directories of your workflow. In the case of error examination, however, the syntactic sugar is pretty sweet. These are the same:

syqada errors *batchroot* [arguments]

syqada manage *batchroot* --errors classify [arguments]

One of the most common usages is:

syqada manage task-directory/ --errors [ classify ] [NN]

This command typically provides all the information you need to debug problems with tasks that have been reported as failed by syqada manage. Because this option is so useful, it has become the default, so the classify keyword is unnecessary. NN, if provided, is the number of lines of each stderr file to show (default is 4). classify makes the reasonable assumption to begin with that error outputs with the same number of lines probably have the same cause, and likely only differ by sample name. It verifies this by comparing all outputs (the python set object makes this trivial) and counting the unique sets of output, both as generated, and with the sample names removed. It then categorizes the results describes them, and shows one example of each result type.

An example follows:

% syqada manage 0010-overlap/ --error classify
Checking control directories... ............................................................
Checking logs... ....................................................................................................
.....................
 164 errors with   1 lines of stderr output
  15 errors with   2 lines of stderr output

%%%%%%%%%%%%% All 164 1-line outputs differ only by the sample id. %%%%%%%%%%%%%
######################## One example of a 1-line output ########################
 TCGA-2L-AAQI-N:    1 stderr,     0 stdout 0010-overlap/ERROR/build_overlap_sets-runner-TCGA-2L-AAQI-N.sh%26755
********************************************************************************
-----------------------------------  stderr  -----------------------------------
--------------------------------------------------------------------------------
Error: The requested bed file (0010-overlap/output/TCGA-2L-AAQI-N.haploh) could not be opened. Exiting!
--------------------------------------------------------------------------------
********************************************************************************

%%%%% There are 2 kinds of 2-line stderr file after extracting sample ids. %%%%%
################## Example 1 of a 2-line output for 15 samples #################
 TCGA-H6-A45NN-T:    2 stderr,     0 stdout 0010-overlap/ERROR/build_overlap_sets-runner-TCGA-H6-A45NN-T.sh%31645
********************************************************************************
-----------------------------------  stderr  -----------------------------------
--------------------------------------------------------------------------------
tail: cannot open `06-annotate_cnvs/output/TCGA-H6-A45NN-T.txt' for reading: No such file or directory
Error: The requested bed file (0010-overlap/output/TCGA-H6-A45NN-T.haploh) could not be opened. Exiting!
--------------------------------------------------------------------------------
********************************************************************************
################## Example 2 of a 2-line output for 15 samples #################
 TCGA-3A-A9IN-T:    2 stderr,     0 stdout 0010-overlap/ERROR/build_overlap_sets-runner-TCGA-3A-A9IN-T.sh%27056
********************************************************************************
-----------------------------------  stderr  -----------------------------------
--------------------------------------------------------------------------------
tail: cannot open `06-annotate_cnvs/output/TCGA-3A-A9IN-T.txt' for reading: No such file or directory
Error: The requested bed file (0010-overlap/output/TCGA-3A-A9IN-T.cnv) could not be opened. Exiting!
--------------------------------------------------------------------------------
********************************************************************************

The –errors option takes several other optional arguments:

--errors [first[N] | last[N] | all [stderr] [stdout] [NN]


--errors first displays information about the first error in the LOGS directory.
It shows the sample name, the length of the stderr and stdout files, and the task runner.

--errors first3 displays information about the first 3 errors (etc)

--errors last shows the same information about the last error.
--errors last4 shows the same information about the last 4 errors.

--errors all shows all errors

--errors stderr [ adds to the previous information the contents of the standard error out (elided
to the first two and last two lines of the .err file. To see more, add the number of lines you wish to
see.

--errors stdout performs the same thing for standard output. Both may be requested, but only one
numeric value for number of lines to be displayed will be recognized and used for both stderr and stdout.

An example follows. This is what you would see if you have run the tutorial to the point that task two fails. It reports the sample that erred, the length of the stderr and stdout files, and the name of the runner script (Note that syqada manage –errors classify will determine that there is a single error and simply show the same output).:

> syqada manage 02-demonstrate-failure-handling/ --errors
Checking control directories... ...........
Checking logs... ......................
********************************************************************************
         rxia:   15 stderr,     1 stdout 02-demonstrate-failure-handling/ERROR/demonstrate-failure-handling-runner-rxia.sh%39571
********************************************************************************
********************************************************************************

The same command, requesting that standard error be displayed for the first error.:

> syqada manage 02-demonstrate-failure-handling/ --errors stderr 8
Checking control directories... ...........
Checking logs... ......................
********************************************************************************
         rxia:   15 stderr,     1 stdout 02-demonstrate-failure-handling/ERROR/demonstrate-failure-handling-runner-rxia.sh%39571
********************************************************************************
-----------------------------------  stderr  -----------------------------------
--------------------------------------------------------------------------------

This demonstrates a job failure that might occur due to a
system configuration issue. In this case, the data had
a length of 8 and there was no file called no-length-based-name-bias.
...(7)...
syqada batch 02-demonstrate-failure-handling --step run
will cause it to run without error.
syqada batch 02-demonstrate-failure-handling --step repend run
will accomplish the same thing in one step
--------------------------------------------------------------------------------
********************************************************************************

The parenthesized number in the ellipsis indicates the number of unprinted lines.

Here is what will happen if you run –reset on a task that has produced results:

> syqada manage 02-demonstrate-failure-handling/ --reset
Checking control directories... ...........
Checking logs... ......................
There are 10 successful outputs. Are you sure?

You are prompted to continue. Y or YES or OK (either upper or lower case) followed by the enter key, will cause the tool to remove all contents of the specified batchroot except for the METADATA file, thus making the directory ready to re-run syqada batch –step init on the batchroot.

The following command:

> syqada manage 02-demonstrate-failure-handling/ --reset

will remove the entire batchroot, thus making the directory ready to re-run syqada auto in the working directory.

The Syntactic sugar Commands

The command:

syqada repend

> syqada repend 02-demonstrate-failure-handling

is equivalent to:

> syqada batch 02-demonstrate-failure-handling --step repend

It moves all failed jobs back to the PENDING directory and cleans up the appropriate files in LOGS. If you add rerun to the command, it will also clean up jobs in the RUNNING queue that were left there by a premature termination of syqada, dispatching those that are actually done to the DONE queue and repending the others. This is particularly useful on the cluster, where jobs can survive the execution of the cluster.

The command:

syqada reset

> syqada reset 02-demonstrate-failure-handling

is equivalent to:

> syqada manage 02-demonstrate-failure-handling --reset

It removes everything in the batch directory except the METADATA file, ready to re-initialize and run. This is appropriate if jobs need to be regenerated because either the config file or the batch’s template file has changed. This allows you to run syqada batch on the task directory if you wish.

The command:

syqada purge

The command:

> syqada purge 02-demonstrate-failure-handling

is equivalent to:

> syqada manage 02-demonstrate-failure-handling --purge

It removes the entire batch directory so that it and the METADATA file can be rebuilt. This is appropriate if jobs need to be regenerated because either the protocol or task file has changed. It necessitates the use of syqada auto again to generate the necessary METADATA file.

The command:

> syqada errors 02-demonstrate-failure-handling 10

is equivalent to:

> syqada manage 02-demonstrate-failure-handling --errors classify

It classifies the stderr outputs of failed jobs and displays one or more.

syqada fix

The command:

> syqada fix 02-demonstrate-failure-handling

is equivalent to:

> syqada manage 02-demonstrate-failure-handling --fix done failed

It cleans up the batch directory in the event that SyQADA died or was killed before the batch was finished.

syqada stderr

The command:

> syqada stderr 02-demonstrate-failure-handling 10

is approximately equivalent to:

> syqada manage 02-demonstrate-failure-handling --errors stderr

It classifies and displays some standard error logs of the batch whether the batch was classified as a success or not.

syqada stdout

The command:

> syqada stdout 02-demonstrate-failure-handling 10

is equivalent to:

> syqada manage 02-demonstrate-failure-handling --errors stdout

It displays some standard output of the batch whether the batch was classified as a success or not.