Iteration¶

Iteration tasks permit embarassingly parallel execution of permutations by programs, such as SparCC, which does random permutations of its inputs to calculate an empirical p-value for its correlation matrix. Iteration is invoked by specifying a number of iterations in a TASKDEF:

iterations = N

N jobs will be created for the task in question. Each time the template is expanded, {output_prefix} will end with a three-decimal zero-padded string (starting from ‘_000’) to differentiate names in the output. The term {iteration} will be substituted (without zero-padding) as well. The zero starting value in the default is due to the SparCC use case, which generates its permutations starting from zero.

Alternatively, two or three numbers can be given, using similar syntax to the python range() operator:

iterations = start,end,by-increment

to increase the flexibility of use and naming. The expansion differs from the behavior of the python range() operator, however, in that start and end are both inclusive, so:

iterations = 1,10

will create ten iterations from numbered from _001 to _010.

A TASKDEF that contains an iterations parameter must refer to a template that makes use of the {iteration} expansion parameter. This may be unnecessary, since {output_prefix} may be sufficient. But for the moment, it seems like a good idea.

Tumor-normal tasks are currently incompatible with the use of iteration. A simple workaround for this is to create an alternative sample_file using first column of the tumor-normal file (already a common hack in tumor-normal workflows). If for some reason the normalname or tumorname values are needed, abusing the term “phenotype” in the sample phenotype file format by simply adding a header column (Sample normalname tumorname) to the tumor_normal file and declaring that as the sample_file should work.

Iteration¶

Previous topic

Next topic

This Page