Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to subsample the input #58

Open
TCLamnidis opened this issue Jan 17, 2023 · 1 comment
Open

Add an option to subsample the input #58

TCLamnidis opened this issue Jan 17, 2023 · 1 comment
Assignees

Comments

@TCLamnidis
Copy link

Currently, there is no way to randomly subsample an input bam within DamageProfiler. When the input bams are very large, this can increase the runtime considerably, while the damage rates estimates are hardly changing compared to using a subset of the reads.

It would be very useful to have an option where a user could specify a number of reads to use for damage calculation, similar to how this functionality is implemented in mapDamage.

Proposed functionality

A user can specify either a number of reads (e.g. 10 000 000), or a fraction of reads (e.g. 0.5).
If an integer is given, use up to that number of randomly subsampled reads for damage calculation. If fewer than the requested reads are in the bam file, simply use all available reads.
If a float is given, randomly subsample that fraction of reads for the calculation.

@JudithNeukamm
Copy link
Collaborator

JudithNeukamm commented Jan 19, 2023

Thanks for this input. I'll have a look on it and will consider this option in a future version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants