panel of normals step always requires e.g. pure_cn config #550

tedil · 2024-11-15T13:38:15Z

… because the get_log_file mechanism does not return a function but actual file paths (and these require config.purecn.enrichment_kit_name and config.purecn.genome_name to be defined) and/or rules are included even though they will definitely not be needed and/or the workflow.get_result_files function reports the incorrect files.
This also holds for other places in snappy_pipeline.

The text was updated successfully, but these errors were encountered:

tedil · 2024-11-15T13:44:33Z

@ericblanc20 here are some ideas to deal with this:

ensure outputs/params/logs are always obtained via input functions, as these are only evaluated when needed
tool specific rules could go into their own {tool}.smk, and be conditionally included in the main Snakefile
skip substep registration for tools that are not configured and not mentioned in the tools: […] section of the config (but this can also lead to issues)

or any combination thereof.

What do you think?

ericblanc20 · 2024-11-27T16:11:43Z

I stumbled on similar issues too.

To me, input functions are the cleanest solution, especially for params.

I like the conditional inclusion of rules in the main Snakefile, but I wonder if it can cover all situations. For example, suppose that there is a cohort with WES & WGS data (such as the DKTK MASTER), and different tools are used for calling variants (wes: [mutect] & wgs: [scalpel]. Depending on the contents of the samplesheet, you may have only WES in your samplesheet, and scalpel doesn't need to be configured. Is that possible with option 2? The same restrictions might also apply if sub-step registration is done conditionally.

For the first option, I just wonder about get_output_files & get_log_file: except for a limited number of complex steps, these methods return dict of strings with wildcards (such as work/{mapper}.thetool.{library_name}/out/{mapper}.thetool.{library_name}.theext). In most cases, the methods don't access the configuration, so they should not suffer if the tool isn't configured. So making functions for those simple operations is not necessary, and it complicates the code (in my opinion).
In the case where the configuration is indeed required, starting the method with a statement such as if self.name not in self.config.tools: return {} should be enough.

I don't know what to think. What I propose is not as clean as forcing input function for everything, but in the end, I am not sure that it is detrimental to the readability of the code.

tedil self-assigned this Nov 15, 2024

tedil added the bug Something isn't working label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panel of normals step always requires e.g. pure_cn config #550

panel of normals step always requires e.g. pure_cn config #550

tedil commented Nov 15, 2024 •

edited

Loading

tedil commented Nov 15, 2024

ericblanc20 commented Nov 27, 2024 •

edited

Loading

panel of normals step always requires e.g. pure_cn config #550

panel of normals step always requires e.g. pure_cn config #550

Comments

tedil commented Nov 15, 2024 • edited Loading

tedil commented Nov 15, 2024

ericblanc20 commented Nov 27, 2024 • edited Loading

tedil commented Nov 15, 2024 •

edited

Loading

ericblanc20 commented Nov 27, 2024 •

edited

Loading