Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: download browsers as TAR #34033

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

Skn0tt
Copy link
Member

@Skn0tt Skn0tt commented Dec 16, 2024

Some of our browsers are already available as .tar.br. Compared to the current .zip archives, the brotli tarballs are ~10-30% smaller. This PR makes us download brotli files for chromium and webkit.

@Skn0tt Skn0tt self-assigned this Dec 16, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@Skn0tt Skn0tt marked this pull request as ready for review December 18, 2024 14:28

This comment has been minimized.

@@ -10,6 +10,7 @@
},
"dependencies": {
"extract-zip": "2.0.1",
"tar-fs": "^3.0.6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the library is popular, but its deps list seem to be excessive for what it does a little. Did we consider alternatives?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, "tar" is even more...

Copy link
Member Author

@Skn0tt Skn0tt Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we considered tar-fs, tar and writing our own. Writing our own turned out more complex than imagined, because webkit has very long path names and the format becomes tricky when that's involved. Of the three, tar-fs seemed the most focused.

@@ -48,8 +48,8 @@
"revision": "1011",
"installByDefault": true,
"revisionOverrides": {
"mac12": "1010",
"mac12-arm64": "1010"
"mac12": "1011",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the motivation for changing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1010 doesn't have .tar.br, and 1010 is identical to 1011 in functionality

This comment has been minimized.

@@ -1229,6 +2813,6 @@ END OF [email protected] AND INFORMATION

SUMMARY BEGIN HERE
=========================================
Total Packages: 48
Total Packages: 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are paying a 25% bump in # of deps for a feature that does not link to a user report linked. Usually not a very good sign.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. It also increases the zip bundle size from 112kb to 202kb. We had this attempt of writing our own tar parser, maybe we should give it another try.

@pavelfeldman
Copy link
Member

Just as an idea, can we utilize extract-zip's non-compression mode for tar? That way we use zip for tar and don't need all this extra code? i.e. the files will be .zip.br, not .tar.br.

@Skn0tt
Copy link
Member Author

Skn0tt commented Jan 6, 2025

Just as an idea, can we utilize extract-zip's non-compression mode for tar?

That'd save some dependencies, but would result in slightly larger bundles1 and it'd prevent streaming extraction. I'd prefer to stick with TAR, gonna take a stab at reducing the bundle size for that.

Footnotes

  1. tested on firefox-mac: .zip is 92mb, .zip.br is 67.2mb and .tar.br is 65.36mb

@Skn0tt
Copy link
Member Author

Skn0tt commented Jan 6, 2025

Alright, i've vendored tar-fs and replaced all of its dependencies with the built-in equivalents. (fun fact: pump, one of the deps, ended up landing in Node.js core as stream.pipeline)

That way we're using a tried & tested tar parser, but don't pay the download price. Let me know if you like that approach, and where's the best place for a vendored module to live / any license specifics we need to follow.

This comment has been minimized.

if (!downloadPathTemplate)
return [];
// old webkit versions don't have brotli
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why only old webkit revisions don't have brotli

Copy link
Member Author

@Skn0tt Skn0tt Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webkit is the only browser we have revisionOverrides overrides for that point to old versions, so the CI script that created them didn't yet create brotli

}
log(`SUCCESS downloading and extracting ${options.title}`);
} else {
await downloadFile(options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing different error handling code in this branch, including explicit checks for ECONNRESET. Is walking away from them intended? Should we do both changes at a time? I'd be more comfortable with leaving the download code as is and swapping piping into file with piping into broti.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new branch is intended to be as similar as possible, while also making the code a little more linear. The ECONNRESET check only changed the error message, so I didn't include that.

Let me see if I can refactor it to make the change less spooky.

Copy link
Member Author

@Skn0tt Skn0tt Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored it so we can reuse the existing download function. Good pointer, thanks!

@@ -0,0 +1 @@
This directory contains a modified copy of the `tar-stream` library that's used exclusively to extract TAR files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure all the third party files are under the third_party folder and corresponding license files are provided beside the files. Make sure they end up in third party list or in a distributed bundle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! See diff.patch for all my changes.

}

shiftFirst (size) {
return this._buffered === 0 ? null : this._next(size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug on 21st line of this library? (I don't see this._buffered defined)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,311 @@
const { Writable, Readable, getStreamError } = require('stream')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getStreamError is not a thing. How is it supposed to work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! Removed the usage of it by moving _predestroy into _destroy. Once I add the diff this will make more sense.

const len = parseInt(buf.toString('ascii', 0, i), 10)
if (!len) return result

const b = buf.subarray('ascii', i + 1, len - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT thinks it is a bug since the value called len is used in the subarray(start, end) signature. Given that the start is i + 1, which points to right after the parsed len, len - 1 can't be a valid end, did they want to say i + len here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be some sort of special case checking. Maybe in some implementations of TAR/PAX, len doesn't contain a length, but an index?

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

Test results for "tests 1"

4 flaky ⚠️ [firefox-library] › tests/library/inspector/cli-codegen-aria.spec.ts:76:7 › should update aria snapshot highlight @firefox-ubuntu-22.04-node18
⚠️ [firefox-page] › tests/page/page-evaluate.spec.ts:403:3 › should throw for too deep reference chain @firefox-ubuntu-22.04-node18
⚠️ [webkit-library] › tests/library/browsercontext-clearcookies.spec.ts:92:3 › should remove cookies by domain @webkit-ubuntu-22.04-node18
⚠️ [webkit-page] › tests/page/page-set-input-files.spec.ts:147:3 › should upload large file @webkit-ubuntu-22.04-node18

37599 passed, 648 skipped
✔️✔️✔️

Merge workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants