Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panel of normals step always requires e.g. pure_cn config #550

Open
tedil opened this issue Nov 15, 2024 · 2 comments
Open

panel of normals step always requires e.g. pure_cn config #550

tedil opened this issue Nov 15, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@tedil
Copy link
Member

tedil commented Nov 15, 2024

… because the get_log_file mechanism does not return a function but actual file paths (and these require config.purecn.enrichment_kit_name and config.purecn.genome_name to be defined) and/or rules are included even though they will definitely not be needed and/or the workflow.get_result_files function reports the incorrect files.
This also holds for other places in snappy_pipeline.

@tedil tedil self-assigned this Nov 15, 2024
@tedil tedil added the bug Something isn't working label Nov 15, 2024
@tedil
Copy link
Member Author

tedil commented Nov 15, 2024

@ericblanc20 here are some ideas to deal with this:

  • ensure outputs/params/logs are always obtained via input functions, as these are only evaluated when needed
  • tool specific rules could go into their own {tool}.smk, and be conditionally included in the main Snakefile
  • skip substep registration for tools that are not configured and not mentioned in the tools: […] section of the config (but this can also lead to issues)

or any combination thereof.

What do you think?

@ericblanc20
Copy link
Contributor

ericblanc20 commented Nov 27, 2024

I stumbled on similar issues too.

To me, input functions are the cleanest solution, especially for params.

I like the conditional inclusion of rules in the main Snakefile, but I wonder if it can cover all situations. For example, suppose that there is a cohort with WES & WGS data (such as the DKTK MASTER), and different tools are used for calling variants (wes: [mutect] & wgs: [scalpel]. Depending on the contents of the samplesheet, you may have only WES in your samplesheet, and scalpel doesn't need to be configured. Is that possible with option 2? The same restrictions might also apply if sub-step registration is done conditionally.

For the first option, I just wonder about get_output_files & get_log_file: except for a limited number of complex steps, these methods return dict of strings with wildcards (such as work/{mapper}.thetool.{library_name}/out/{mapper}.thetool.{library_name}.theext). In most cases, the methods don't access the configuration, so they should not suffer if the tool isn't configured. So making functions for those simple operations is not necessary, and it complicates the code (in my opinion).
In the case where the configuration is indeed required, starting the method with a statement such as if self.name not in self.config.tools: return {} should be enough.

I don't know what to think. What I propose is not as clean as forcing input function for everything, but in the end, I am not sure that it is detrimental to the readability of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants