JobGenerator.py

This is a freestanding command, but it is normally invoked behind the scenes from syqada batch.

The function of JobGenerator is to process a job template and generate a series of shell scripts suitable for use in batch.py Optionally split into chromosome or smaller chunks, or merge several files as input.

This works fine from the command line in the test suite but has not been invoked except by call from BatchRunner.py since the BatchRunner was created. Probably better to look there for an example.

Developer Documentation Only Below This Point:

The important routine is do_split(), which uses the given template to generate job scripts that are then managed by JobBatch.

JobGenerator.common_parsing(parser=None)

Arguments that are shared with programs (i.e., BatchRunner) that invoke JobGenerator.

JobGenerator.configured_parser(parser=None)

Build a parser with the shared arguments and add local one.

JobGenerator.determineSplit(testsplit)

Modify metadata[SPLIT] to reflect the desired chromosomes if any, and return how many jobs each task in this batch will take.

JobGenerator.do_split(qmanager, template, sampledict, filename, splitsize, overlap, job_iterator, outputdir=None, controldir=None, logdir=None, rundir=None, logit=<function logit>, **kwargs)

Generate all the needed run scripts for a given script template based on a chromosome split criterion, sample, and filename or list of filenames.

qmanager instance of a queue manager implementation to be used.

template path to a script template containing keyword substitutions in python str.format() syntax, i.e., names surrounded by braces, OR a string commencing with the word INLINE.

sampledict dict of terms derived from the samplefile, or the name of a sample to use as a label for filenames and interpolation into the template

filename filename or names to be interpolated into the template

splitsize comma-separated list of chromosomes or ignore

overlap amount to overlap the split, default 0 (works, but never yet used,
2017/05, currently unusable)

job_iterator number of jobs to iterate with the special term iteration set to an incrementing number, zero naturally implying no iterations variable incremented or applied. job_iterator can also be a tuple of values for use in the range command to permit iteration from 1 instead of zero, for instance, or from m to n by 2’s, as in range(m,n,skip)

outputdir, controldir, logdir, and rundir name alternate paths for these directories. If they are not specified, the queue manager chooses standard names.

kwargs contain additional parameters that can be passed to match special terms in template Now includes inputdir, (formerly a named parameter) Now includes added_input, a list of alternative input directories

JobGenerator.main(argv, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)

Create job scripts based on a template and a split or merge parameter

This was abandoned as a command-line application when BatchRunner became solid, ie, with syqada 0.9. It is still used in test_JobGenerator.py, however

JobGenerator.make_touchoutput_tag(dir)

Reduce the number of inodes used by creating a zero-length target for hard links

JobGenerator.pair_info(sample, root, pairing_key)

I wish this were not here. It is only useful for bwa sampe, as far as I know right now. Way too specialized to be buried this deep in the toolsuite.

Return reverse name and a group_id given the forward, using a pattern chosen by the PAIRED keyword

Current valid patterns are ‘core’ (what the MDACC core produces for multiplexed runs) and ‘trailing-dash’ (the form found in at least some Broad reads in TCGA)

Obviously this should be extensible, but it ain’t

JobGenerator.parse(input=None, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)

Configure a parser and perform semantic validation its parsed output

JobGenerator.process_special_terms(line, sampledict, **kwargs)

added_input and the sample file get special treatment before being sent to the queue_manager for formatting.

line is a line of a template to process

sampledict is a dictionary generated from the samplefile

JobGenerator.write_command(cmd, file)

Write cmd to a file named file and make it executable.