Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple stageInMode directives per process #4199

Closed
mwhamgenomics opened this issue Aug 17, 2023 · 2 comments
Closed

Multiple stageInMode directives per process #4199

mwhamgenomics opened this issue Aug 17, 2023 · 2 comments

Comments

@mwhamgenomics
Copy link

New feature

At the moment, stageInMode sets a single value for the entire process re. how to stage input files. It would be useful in some cases to be able to specify stageInMode multiple times and stage different input files in different modes, similar to #256.

Usage scenario

This would be useful in situations where a tool doesn't like symlinks in some cases, but it would be wasteful to stage all input files in copy mode, e.g. if the inputs are a mix of small config/batch files and large bam files. This would eliminate the need to do workarounds with cp -L in the 'script' block.

Suggest implementation

Specify stageInMode multiple times in the process declaration:

process SOME_PROCESS {
  input:
  path(batch_file)  // tool crashes if this is a symlink
  path(bam_file)  // but bam files are fine

  // specify input channel names or file patterns to stageInMode
  stageInMode batch_file, mode: "copy"
  stageInMode "*.bam", mode: "symlink"

  script:
  """
  some_tool --batch ${batch_file} --bam ${bam_file}
  """
}

Or as a list of directives in nextflow.config:

process {
  withName: 'SOME_PROCESS' {
    stageInMode = [
      [pattern: '*.batch', mode: 'copy'],
      [pattern: '*.bam', mode: 'symlink']
    ]
  }
}

Maybe it's possible to use closures with nextflow.config to get at the variable names?

process {
  withName: 'SOME_PROCESS' {
    stageInMode = {
      return [
        [pattern: batch_file, mode: 'copy'],
        [pattern: bam_file, mode: 'symlink']
      ]
    }
  }
}

The system would need to be able to handle conflicts (unlike publishDir, it wouldn't make sense to set multiple stageInModes on a single input file):

stageInMode "*.bam", mode: "copy"
stageInMode "*.bam", mode: "symlink"  // could throw an error, or just have subsequent declarations for the same pattern overwrite any previous ones
@pditommaso
Copy link
Member

In the rare case that a tool does not support symlinks, the problem can be managed directly in task command.

@stevekm
Copy link
Contributor

stevekm commented Nov 10, 2023

Seems like it should instead be specified like this;

publishDir "${params.outdir}/bcl-convert", mode: 'copy'

input:
    tuple val(meta), path(input_dir), path(samplesheet, stageInMode: "copy")

output:
    path(samplesheet)
    path(outputDir)

This does not actually work; gives ERROR ~ No such variable: stageInMode

But if you could handle it like this, you would be able to specify different stage in modes for the input files, while also handling some of the issues that seem to be related to these Fusion filesystem file handling issues;

#4348
#4309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants