-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SvPileup
should ignore reads marked as duplicates
#41
Comments
I would add an option to the tool to ignore duplicates, with default |
I'm going to disagree slightly and suggest that we add: |
Major bump would fine by me. |
I didn't see #42 until also attempting a solution: |
I tested the two branches #42 and #45 that were written to address this issue and they had identical results (yay!). I used a sample with 71,184 read pairs (with UMIs) that could potentially support a rearrangement, i.e. they either spanned an expected breakpoint or had a split read around an expected breakpoint.
The behaviour appears to be as expected - if duplicate reads are allowed, the first two input files have identical output - they have the same content, with only duplicate reads marked in the "Picard MarkDuplicates only" file. If duplicate reads are not allowed, the number of SV pileups drops to from 771 to 262 but only for the file with marked duplicates. I confirmed that the coordinates of all of the 262 pileups are found in the set of 771. If UMI de-duplication is used, duplicates are removed entirely because Note - the use of the |
The number of pileups changes when duplicate reads are removed.
Test examples went from 575 > 283 pileups and 2030 > 550 pileups.
The text was updated successfully, but these errors were encountered: