Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fasterq-dump => disk: local-ssd #332

Open
nick-youngblut opened this issue Jan 11, 2025 · 3 comments
Open

fasterq-dump => disk: local-ssd #332

nick-youngblut opened this issue Jan 11, 2025 · 3 comments
Labels
enhancement Improvement for existing functionality

Comments

@nick-youngblut
Copy link

nick-youngblut commented Jan 11, 2025

Description of feature

As far as I can tell, the fasterqdump module does not set disk.

Setting the disk to local-ssd for use with google-batch could be quite helpful for speeding up the writing of temp files (docs).

More generally, if disk is not set (and dynamically expanded for large SRA files), what would prevent the job from running out of disk space -- at least for cloud jobs?

@nick-youngblut nick-youngblut added the enhancement Improvement for existing functionality label Jan 11, 2025
@Midnighter
Copy link
Contributor

You can configure the module args to change, for example, the temporary file path. You can check options here https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump. If you have enough memory and files are not that big you can even use an in memory location for the temporary files.

Unless I forget something there is nothing preventing the processes from running out of disk space. Setting the disk limit also doesn't prevent that, it just determines when that happens more explicitly. Note that the prefetch has a default max download size so that's also an option for avoiding large files.

@nick-youngblut
Copy link
Author

If you have enough memory and files are not that big you can even use an in memory location for the temporary files

As far as I can tell, fasterq-dump nf-core module doesn't provide a method of setting local-ssd, so how could I (adaptively) provision enough temp disk space? Moreover, local-ssd expands the space at /tmp, but /tmp is out-of-scope for output: paths, so the final reads must be in the process working directory. These final reads could be very large, so I need to (adaptively) provision enough ssd and boot disk space for each GCP Batch just running fasterq-dump.

@nick-youngblut
Copy link
Author

Setting the disk limit also doesn't prevent that, it just determines when that happens more explicitly. Note that the prefetch has a default max download size so that's also an option for avoiding large files

Yeah, the disk limit just will just explicitly result in a non-zero exit. It doesn't seem that useful.

I'm using a max file size of 300 GB for prefetch, but that can still result in some very large fastq files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants