Reduce bundle size of `ens_normalize` function #28

dawsbot · 2024-02-18T06:57:50Z

Hey @adraffy, I'm not sure if there are any easy wins on the bundle size, but I'm happy to help if so! Do you have any suggested areas to work on?

Seeing as both ethers and viem rely on this library and the ens_normalize function carries somewhere around 25kb into these packages, it might be possible to reduce that? You're the expert, let me know if or how I can help!

The text was updated successfully, but these errors were encountered:

adraffy · 2024-02-18T08:02:03Z

There are some thoughts here: #21 (comment)

With browser detection, it would be possible to use features (like Unicode regex patterns.)

I'll update this when I get some time.

adraffy · 2024-02-20T00:37:32Z

Possible ideas:

Compress spec.json using ANY technique
- My current implementation takes this from 2.99 MB → 14673 bytes for ENS data and 5588 bytes for Unicode NF data. There is also some overhead for the decompression code.
- It's already using a bunch of tricks like arithmetic coding, various forms of run-length compression, and many domain-specific things (emoji encoded into a trie), etc. For reference, just the raw list of valid emoji as a string are larger than this entire library, yet this library has a function that produces that list.
- Likely to make any progress, you'll need to understand the structure of spec.json and how it's used. make.js is responsible for turning spec.json into the compressed data.
Compress my compressed data using ANY technique. I don't include these files in the repo, but if you uncomment that file and run npm run make, you'll get two JSON files with byte[]. Those are the bytes that get turned into base64 (4/3 expansion factor) and fed into the decoder.
Compress my uncompressed data using ANY techique. Same as above, but instead of writing out data which corresponds to the arithmetic coder format, write out enc.values instead, which will be int[]. These are the symbols that are fed into the arithmetic coder. They are biased towards low values (see the histogram in the link above) but it also includes large values like a codepoint or Δcodepoint in situations where I had to encode an one-off value. My compressor deals with this by encoding [0-60ish] verbatim and then a separate symbol to imply that you should read the next value as a "large" value.
It would be pretty easy for me build other variants of ens-normalize that make assumptions about client features, like dynamically load a version for browsers that are sufficiently modern. The bulk of the data is in the script data. If the browser is using a sufficient version of Unicode, this data can be derived from \p.
If the bundle is being served with compression, one layer of compression can be removed, although from my calculations, my compressor code + compressed data was still smaller than gzipped output.
Some size is related to producing correct error messages. If no error message is required beyond "not normalized", additional reductions can be made.
ens_tokenize() should be tree-shook from your bundle assuming you only need ens_normalize.

I'd estimate (4) is the easiest and could trim the entire NF payload + the bulk of the character data.

Also, there might be something I missed in (1) w/r/t compression, as I wrote the compressor in beginning of this project as I had to confirm that it was feasible to jam the entire Unicode character data into the library. Whereas now, the structure of spec.json is stable.

dawsbot · 2024-02-20T20:57:28Z

Incredible; I appreciate the breakdown, @adraffy! I'm going to be bold here and claim that I likely cannot make the in-the-weeds fixed you've recommended as quality as you could, but I'm happy to take a stab at (4) if it seems this is high-value and you don't have time.

Again, my attempt likely won't come close to yours, seeing as I don't have this type of character encoding experience, but I'm happy to learn!

tmm · 2024-06-04T21:34:44Z

Seeing as both ethers and viem rely on this library and the ens_normalize function carries somewhere around 25kb into these packages, it might be possible to reduce that? You're the expert, let me know if or how I can help!

@dawsbot Worth noting that Viem inverts control so you can use another normalize function (or skip normalization if you know what you are doing). This also means that if you aren't using ENS with Viem, ens_normalize won't impact the final bundle!

import { normalize } from 'viem/ens' // proxies `ens_normalize` export
 
const ensAddress = await client.getEnsAddress({
  name: normalize('wevm.eth'),
})

import { custom_normalize } from 'custom-normalize' // use whatever normalize you like!
 
const ensAddress = await client.getEnsAddress({
  name: custom_normalize('wevm.eth'),
})

adraffy · 2024-06-05T17:26:59Z

I would be very careful with using non-standard normalization. While the bulk of ENS names currently ASCII, I would expect that trend to change in the future.

Spoofing names is a serious attack vector—look how many people get scammed by address poisoning.

Is the current library size an actual problem? Relative to nearly every site I see, asset and code bloat dwarfs actual library code by a huge margin.

The goal of this library is to produce the correct result for ALL inputs across any engine that can run the library. It accomplishes that by internalizing everything (which includes the full Unicode spec).

FYI: The major browser vendors can't even agree on URL parsing.

When I update the library to Unicode 16 for the September release, I'll revisit the compression logic.

adraffy · 2024-06-06T19:14:19Z

I can supply a stub function if you want async import(...) the rest of the library when the fastpath isn't sufficient? But it would make the callsite async.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce bundle size of `ens_normalize` function #28

Reduce bundle size of `ens_normalize` function #28

dawsbot commented Feb 18, 2024

adraffy commented Feb 18, 2024 •

edited

Loading

adraffy commented Feb 20, 2024 •

edited

Loading

dawsbot commented Feb 20, 2024 •

edited

Loading

tmm commented Jun 4, 2024

adraffy commented Jun 5, 2024 •

edited

Loading

adraffy commented Jun 6, 2024 •

edited

Loading

Reduce bundle size of ens_normalize function #28

Reduce bundle size of ens_normalize function #28

Comments

dawsbot commented Feb 18, 2024

adraffy commented Feb 18, 2024 • edited Loading

adraffy commented Feb 20, 2024 • edited Loading

dawsbot commented Feb 20, 2024 • edited Loading

tmm commented Jun 4, 2024

adraffy commented Jun 5, 2024 • edited Loading

adraffy commented Jun 6, 2024 • edited Loading

Reduce bundle size of `ens_normalize` function #28

Reduce bundle size of `ens_normalize` function #28

adraffy commented Feb 18, 2024 •

edited

Loading

adraffy commented Feb 20, 2024 •

edited

Loading

dawsbot commented Feb 20, 2024 •

edited

Loading

adraffy commented Jun 5, 2024 •

edited

Loading

adraffy commented Jun 6, 2024 •

edited

Loading