-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update murmur3 length / family to match current uses #236
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg @willscott I don't know what the rush was to have this PR merged, but as mentioned both this and the newer
x64-128
codes are not particularly well specced especially in relation to each other.mumur3 implementations are a rats nest of unspecified confusion, I described the situation in multiformats/go-multihash#135 and linked here. This lack of spec (i.e. relying on behaviors of one particular library) has also been described in ipld/specs#131 (comment).
A concrete example of this is what are the
0x22
and0x1022
hashes forThe quick brown fox jumps over the lazy dog
? This thread (aappleby/smhasher#6) seems to indicate that it should be6C1B07BC7BBC4BE347939AC4A93C437A
however, as described in spaolacci/murmur3#21 (and verified by a simple test) the library proposed in multiformats/go-multihash#150 givese34bbc7bbc071b6c7a433ca9c49a9347
.So which one is
murmur3-x64-128
supposed to be and what is 0x22 supposed to be? It seems possible they could be different endianness/that x64-128 is not a longer version of 0x22.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann merging isn't ruling out additional tweaks here, but I'm not sure what that would involve because the murmur3 "ecosystem" is such a mess. In JS we've had compatibility problems too, some of them JS specific, but most recently we landed on https://cimi.io/murmurhash3js-revisited/, which for
The quick brown fox jumps over the lazy dog
getse34bbc7bbc071b6c7a433ca9c49a9347
also.I think the real problem relates to the conversion to a byte array and the lack of an agreed standard in doing so. If I run applyby/smhasher against
The quick brown fox jumps over the lazy dog
I get16378391709484522348
+8809951995912426311
as my 128 bit output. Writing those to a byte array in that order as little-endian I get6c1b07bc7bbc4be347939ac4a93c437a
, but writing them as big-endian I gete34bbc7bbc071b6c7a433ca9c49a9347
. So the output integers are correct, but the lack of agreed encoding of those integers to a byte array seems to explain the descrepancy. It seems that in both the JS and Go implementations we're using that we're doing this as big-endian. But is that something we want to document here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll also note that aappleby/smhasher#6 has the erroneous output for Guava as
4cae51b5316602c01c7c5642843e5fe7
, which doesn't figure in any of the above, it seems to be an actual error in that implementation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg my understanding is that generally the that a major point of the multicodec table is to be able to have different groups reuse and implement the same schemes and identify their data the same way such that they can build compatible systems. In the case of multihash I'd generally assume that we're specifying data of the form
<hash code><hash len><hash bytes>
and how a user could take in a set of bytes and compute the same multihash output.As a result, it seems like knowing how to order the output bytes to create the codec seems important for reproducibility. Am I missing something?
In general I'd hope that we could have some criteria for most codec entries which is to be able to point people at a spec (even if it's a living one) for how to implement these things and what the human meaning behind any particular 0xABCD actually means.
My main objections to what was merged here were: