-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combination of 'storeDir' with optional output results in process always being skipped #4123
Comments
Note that I'm aware of |
Hello, @mcallaway. You can set Snippet below: process foo {
publishDir '/data/chunks', mode: 'copy', overwrite: false
output:
path 'chunk_*'
'''
printf 'Hola' | split -b 1 - chunk_
'''
} |
@bentsherman and @pditommaso - |
I think it is a fundamental limitation of storeDir, because there is no cache metadata to verify whether the optional output should be there from a previous run. The same is true for an output with a variable number of files, as there is no way to verify the number of files produced by a previous run. For now I think it's worth documenting these limitations. We are investigating some ideas that I think could replace storeDir in the long-term. |
The best workaround that I can think of is that a process that uses storeDir should always have at least one non-optional file output. This will guarantee that the process is executed at least once. You might have to store a dummy output file to make it work. I believe there are also cases where a process has multiple optional outputs, but in practice at least one of those outputs is expected to be present. That is more of a modeling problem that we need to solve in the language. For example, instead of having two optional outputs for BAM and CRAM, there should be one required output that is somehow modeled as "either BAM or CRAM". Invalid states should be unrepresentable |
Bug report
Expected behavior and actual behavior
Given a process with an optional output and the storeDir directive, the process should run if the output file is not present in the storeDir. If the script produces no output file, it should not be an error.
Actual behavior is that the process is skipped if the output file is not present.
Steps to reproduce the problem
Here is a process definition:
Program output
.nextflow.log shows:
Environment
Additional context
None
The text was updated successfully, but these errors were encountered: