Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of CRAM 3.1 codecs. #1714

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

cmnbroad
Copy link
Collaborator

@cmnbroad cmnbroad commented Sep 4, 2024

This is a squashed and rebased branch containing all of the changes from Yash's CRAM 3.1 codec branches. Replacement for #1618, 1644, 1663 and 1704.

@lbergelson lbergelson added the cram label Oct 1, 2024
@cmnbroad cmnbroad force-pushed the cn_cram_3_1_codecs branch 2 times, most recently from b92597b to bfb60ea Compare November 25, 2024 20:28
@jerryliu2005
Copy link

Hello. Thank you very much for this PR! I learned this fix is critical for an IGV upgrade to support CRAM v3.1, which is very important to our projects. I am wondering if there’s an estimated timeline for merging. Much appreciated!

@cmnbroad
Copy link
Collaborator Author

@jerryliu2005 This branch is nearly done, but there are a few TODOs left, plus I want to do some more large-scale round trip testing. Those should get done over the next couple of weeks. However, this branch doesn't actually turn on CRAM 3.1 support - I have a separate, smaller branch for that which I'm using for testing. Once those are merged, we'll need to do an htsjdk release, and then IGV can upgrade. I'd hope/expect that all to happen in the Jan/Feb timeframe.

@jerryliu2005
Copy link

Hi @cmnbroad, thanks very much for the info! I will let the team know about the rough timeframe to be ready :)

@cmnbroad cmnbroad force-pushed the cn_cram_3_1_codecs branch from ce90f4d to 2b4ffd8 Compare January 7, 2025 22:30
@cmnbroad cmnbroad marked this pull request as ready for review January 22, 2025 14:55
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad I have a few very minor comments. I did not do a deep review.

One thing that I would really like to have is a clear indication of what exact commit of hts-specs the data came from. Maybe a top level README in the test data folder?

Comments on the compressors describing what sort of data they're intended for would be useful for the future. Things like FQZComp are non obvious.

One thing I noticed that I hadn't though about before, it looks like there is a lot of array creation and copying going on during compression/ decompression. If we ever want to speed things up that might be a good place to take a look. I can imagine many of the buffers could probably be reused.

return i;
}

public static ByteBuffer encodePack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment describing what this method is for would be helpful

}

// returns a new LITTLE_ENDIAN ByteBuffer of size = bufferSize
//TODO: rename this to allocateLittleEndianByteBuffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you want to rename these?

len |= model.getLength()[1].modelDecode(inBuffer, rangeCoder) << 8;
len |= model.getLength()[2].modelDecode(inBuffer, rangeCoder) << 16;
len |= model.getLength()[3].modelDecode(inBuffer, rangeCoder) << 24;
//TODO: it is entirely f'd up that fixedlen appears to be used as a flag elsewhere, but as an actual length here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely fucked up doesn't sound ideal. 🙈


import java.nio.ByteBuffer;

public class FQZCompDecode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if each decompressor had a comment describing it's reason for existence, like what sort of data it is good at compressing. For the ones that were implemented off the C or Javascript a link to that code might be useful for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants