You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
… because the get_log_file mechanism does not return a function but actual file paths (and these require config.purecn.enrichment_kit_name and config.purecn.genome_name to be defined) and/or rules are included even though they will definitely not be needed and/or the workflow.get_result_files function reports the incorrect files.
This also holds for other places in snappy_pipeline.
The text was updated successfully, but these errors were encountered:
@ericblanc20 here are some ideas to deal with this:
ensure outputs/params/logs are always obtained via input functions, as these are only evaluated when needed
tool specific rules could go into their own {tool}.smk, and be conditionally included in the main Snakefile
skip substep registration for tools that are not configured and not mentioned in the tools: […] section of the config (but this can also lead to issues)
To me, input functions are the cleanest solution, especially for params.
I like the conditional inclusion of rules in the main Snakefile, but I wonder if it can cover all situations. For example, suppose that there is a cohort with WES & WGS data (such as the DKTK MASTER), and different tools are used for calling variants (wes: [mutect] & wgs: [scalpel]. Depending on the contents of the samplesheet, you may have only WES in your samplesheet, and scalpel doesn't need to be configured. Is that possible with option 2? The same restrictions might also apply if sub-step registration is done conditionally.
For the first option, I just wonder about get_output_files & get_log_file: except for a limited number of complex steps, these methods return dict of strings with wildcards (such as work/{mapper}.thetool.{library_name}/out/{mapper}.thetool.{library_name}.theext). In most cases, the methods don't access the configuration, so they should not suffer if the tool isn't configured. So making functions for those simple operations is not necessary, and it complicates the code (in my opinion).
In the case where the configuration is indeed required, starting the method with a statement such as if self.name not in self.config.tools: return {} should be enough.
I don't know what to think. What I propose is not as clean as forcing input function for everything, but in the end, I am not sure that it is detrimental to the readability of the code.
… because the
get_log_file
mechanism does not return a function but actual file paths (and these requireconfig.purecn.enrichment_kit_name
andconfig.purecn.genome_name
to be defined) and/or rules are included even though they will definitely not be needed and/or the workflow.get_result_files function reports the incorrect files.This also holds for other places in snappy_pipeline.
The text was updated successfully, but these errors were encountered: