-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity encoding decoding doesn't produce the same data #122
Comments
You're hitting limitations of JavaScript's UTF-8 handling. There are some bytes that JavaScript just won't properly preserve during a bytes->string->bytes round-trip. The in-built assumption is that conversion to UTF-8 from bytes involves actual UTF-8 characters, unlike some languages, such as Go which can To illustrate, take your 3rd byte, which can't be represented as UTF-8 (note how the first 2 are present in the round-trip): > new TextDecoder().decode(new Uint8Array([184]))
'�'
> new TextDecoder().decode(new Uint8Array([184])).charCodeAt(0)
65533
> new TextEncoder().encode(new TextDecoder().decode(new Uint8Array([184])))
Uint8Array(3) [ 239, 191, 189 ] So you can see that invalid UTF-8 bytes get converted to The identity multibase doesn't have much choice here, it's only safe to use with bytes that can be properly converted with JavaScript to strings, or use a multibase that maps characters to avoid this problem (which is one of the points of using base encoding!). I hope that helps explain the situation, even if it probably doesn't give you an easy solution. |
I used codePointAt to convert to JS binary strings and back. https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary Maybe that can be used instead? |
Hm, that might not be a bad idea since codepoint addressing is now standard across runtimes. |
Yea I used it for the above example and I compared it to multibase to see if there was any differences. https://github.com/MatrixAI/js-id/blob/4ea34f2b50e8f259576fc2f8bb9f80d9a167e1a1/src/utils.ts#L75-L85 function toString(id: Uint8Array): string {
return String.fromCharCode(...id);
}
function fromString(idString: string): Id | undefined {
const id = IdInternal.create(16);
for (let i = 0; i < 16; i++) {
id[i] = idString.charCodeAt(i);
}
return id;
} And it worked whereas multibase failed. |
I'm not sure if identity encoding is meant to be used like this, but I noticed that after decoding, you don't get the same data:
The text was updated successfully, but these errors were encountered: