Automator.py

Main SyQADA Driver. This initializes and runs a series of BatchRunners.

It is invoked as:

>>> syqada auto

The Automator takes a configuration, a sample file, and a protocol, and initializes and then runs the steps of a workflow.

A pipeline is an instance of a protocol. A pipeline is configured as a directory that contains a control directory. The control directory contains a config file, a samples file, and a protocol file (or soft link to one), plus any individual task configurations.

The workflows directory of the SyQADA release contains protocol definitions that list the tasks of the protocol and where to find them. It also contains a protocol task definition with defaults for each task.

syqada auto:

reads the protocol file and builds a protocol
for each task:
  constructs a working directory and METADATA file
then
for each task:
  determines whether it has been done
  if not,
     determines parameters from defaults and overrides
     constructs the appropriate BatchRunner
     fires it off
     waits for completion
  verifies results
  rinses, moves to the next task to repeat
>>> Automator.py [--configuration configfile
                  --sample_file samplefile
                  --protocol controlfile
                  ]
                  [--project PROJECTNAME]
                  [--parameters parameters]
                  [--init]
                  [--ignore NNNN ...]

–protocol

file that names the tasks to be run

–configuration

file that provides definitions of terms used in templates, usually executable paths

–sample_file

file that names the samples on which tasks are to be run

–parameters

parameter file for various tasks. I dream of creating a standard parameters file for filling certain parameters in the configs. Still not used.

–ignore

optionally ignore an error in a step and proceed to the next step by listing its numeric prefix, e.g.,:
>>> syqada auto --ignore 0004 0007

I have found this useful for cases where sourcedata has typos in embedded sample names, so that one sample of 300 failed a step.

Developer Documentation Only Below This Point:

Architecturally, a Protocol is a series of Tasks (both of these are defined in Protocol.py) All tasks have TASKDEF, NAME, TEMPLATE, WALLTIME, PROCESSORS plus other task-specific attributes. The Automator constructs a Protocol from a protocol file, and then invokes a BatchRunner for each task of the Protocol until it either fails or the entire Protocol completes successfully.

class Automator.Automator(args, taskformat=None)

Automate a series of JobBatches.

Accept a configuration file, sample file, a task, and numerous other input parameters, and build or reload a JobBatch, create the jobs necessary to run the task on the input data, and let the JobBatch manage the jobs.

BatchRunner_main(input, resume, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)

Duplicating BatchRunner.main here step by step will allow eventual improved handling of restarts, he asserted without proof.

initialize(stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)

1.1 Delegate most of the work to the Protocol

logit(message, level='INFO', console=False, notify='INFO', stdout=None)

Use the jobbatch logger to record and display

run(stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>)

Execute the tasks in order, stopping if one fails.

Automator.parse(input=None, stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>)

Return and create all of the necessary information to get the JobBatch running Complain about semantic validation issues.