-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of CRAM 3.1 codecs. #1714
base: master
Are you sure you want to change the base?
Conversation
b92597b
to
bfb60ea
Compare
Hello. Thank you very much for this PR! I learned this fix is critical for an IGV upgrade to support CRAM v3.1, which is very important to our projects. I am wondering if there’s an estimated timeline for merging. Much appreciated! |
@jerryliu2005 This branch is nearly done, but there are a few TODOs left, plus I want to do some more large-scale round trip testing. Those should get done over the next couple of weeks. However, this branch doesn't actually turn on CRAM 3.1 support - I have a separate, smaller branch for that which I'm using for testing. Once those are merged, we'll need to do an htsjdk release, and then IGV can upgrade. I'd hope/expect that all to happen in the Jan/Feb timeframe. |
Hi @cmnbroad, thanks very much for the info! I will let the team know about the rough timeframe to be ready :) |
…safe, precompile the name Tok regex, eliminate unnecessary code.
…ns, fix the read name separator.
07828c3
to
70665d1
Compare
ce90f4d
to
2b4ffd8
Compare
df62f59
to
282322a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad I have a few very minor comments. I did not do a deep review.
One thing that I would really like to have is a clear indication of what exact commit of hts-specs the data came from. Maybe a top level README in the test data folder?
Comments on the compressors describing what sort of data they're intended for would be useful for the future. Things like FQZComp are non obvious.
One thing I noticed that I hadn't though about before, it looks like there is a lot of array creation and copying going on during compression/ decompression. If we ever want to speed things up that might be a good place to take a look. I can imagine many of the buffers could probably be reused.
return i; | ||
} | ||
|
||
public static ByteBuffer encodePack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment describing what this method is for would be helpful
} | ||
|
||
// returns a new LITTLE_ENDIAN ByteBuffer of size = bufferSize | ||
//TODO: rename this to allocateLittleEndianByteBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want to rename these?
len |= model.getLength()[1].modelDecode(inBuffer, rangeCoder) << 8; | ||
len |= model.getLength()[2].modelDecode(inBuffer, rangeCoder) << 16; | ||
len |= model.getLength()[3].modelDecode(inBuffer, rangeCoder) << 24; | ||
//TODO: it is entirely f'd up that fixedlen appears to be used as a flag elsewhere, but as an actual length here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entirely fucked up doesn't sound ideal. 🙈
|
||
import java.nio.ByteBuffer; | ||
|
||
public class FQZCompDecode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if each decompressor had a comment describing it's reason for existence, like what sort of data it is good at compressing. For the ones that were implemented off the C or Javascript a link to that code might be useful for future reference.
This is a squashed and rebased branch containing all of the changes from Yash's CRAM 3.1 codec branches. Replacement for #1618, 1644, 1663 and 1704.