Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf(cargo-package): match certain path prefix with pathspec
`check_repo_state` checks the entire git repo status. This is usually fine if you have only a few packages in a workspace. For huge monorepos, it may hit performance issues. For example, on awslabs/aws-sdk-rust@2cbd34d the workspace has roughly 434 members to publish. `git ls-files` reported us 204379 files in this Git repository. That means git may need to check status of all files 434 times. That would be `204379 * 434 = 88,700,486` checks! Moreover, the current algorithm is finding the intersection of `PathSource::list_files` and `git status`. It is an `O(n^2)` check. Let's assume files are evenly distributed into each package, so roughly 470 files per package. If we're unlucky to have some dirty files, say 100 files. We will have to do `470 * 100 = 47,000` times of path comparisons. Even worse, because we `git status` everything in the repo, we'll have to it for all members, even when those dirty files are not part of the current package in question. So it becomes `470 * 100 * 434 = 20,398,000`! Instead of comparing with the status of the entire repository, this patch use the magic pathspec[1] to tell git only reports paths that match a certain path prefix. This wouldn't help the `O(n^2)` algorithm, but at least it won't check dirty files outside the current package. [1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
- Loading branch information