Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo package VCS dirty check corner cases #14967

Open
2 of 3 tasks
weihanglo opened this issue Dec 19, 2024 · 4 comments
Open
2 of 3 tasks

cargo package VCS dirty check corner cases #14967

weihanglo opened this issue Dec 19, 2024 · 4 comments
Assignees
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-package S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@weihanglo
Copy link
Member

weihanglo commented Dec 19, 2024

Problem

During the investigation of #14955 and #14962, we found more corner cases that cargo package doesn't consider dirty. You don't even need to pass --allow-dirty flag to explicitly allow it. .cargo_vcs_info.json shows no dirty at all. However, the actual packaged .crate file contains dirty files.

Let's consider this virtual workspace:

./
├── foo/
│   ├── src/
│   ├── Cargo.toml
│   └── README.md -> ../README.md
├── Cargo.toml
├── LICENSE
└── README.md
# foo/Cargo.toml
[package]
name = "foo"
edition.workspace = true
license-file = "../LICENSE" # link to parent LICENSE
# Cargo.toml
[workspace]
members = ["foo"]
[workspace.package]
edition = "2015"

There are at least three corner cases:

Steps

See #14966

Possible Solution(s)

No response

Notes

No response

Version

cargo 1.85.0-nightly (769f622e1 2024-12-14)

9c17646

@weihanglo weihanglo added A-git Area: anything dealing with git C-bug Category: bug Command-package S-triage Status: This issue is waiting on initial triage. S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-triage Status: This issue is waiting on initial triage. labels Dec 19, 2024
@epage
Copy link
Contributor

epage commented Dec 19, 2024

Options

  • We manually patch up each case
  • We check if the whole repo is dirty, instead of only the package

I re-implement this check in cargo-release so I catch it before the release process, not just publish. It check if the workspace is clean (instead it should check if the whole repo is clean). I implemented it this way because (1) its simpler and (2) people can hook into cargo-release and do their own thing so I can't exhaustively know all the files.

If we check the whole repo, potential broken workflows

  • An interactive user has a file edited for local development purposes that is unrelated to publishing
    • As its interactive, we have fewer guarantees.
  • A programmatic user (script) manually releases multiple packages in a workspace (so works on stable) and has to make files dirty for the release process of some but not all and so they selectively pass in --allow-dirty
    • I would assume that if they need to edit one, they need to edit all so unsure how likely this is to happen

@weihanglo
Copy link
Member Author

weihanglo commented Dec 19, 2024

BTW the doc says this:

`dirty` indicates that the Git worktree was dirty when the package
was built.

and this:

<dd class="option-desc">Allow working directories with uncommitted VCS changes to be packaged.</dd>

So it seems to look at the repository as a whole when it was designed, but #2781 made it check only files from PathSource::list_files.

#2063 has some discussion around this. Haven't read through all of them.

@epage
Copy link
Contributor

epage commented Dec 20, 2024

Looking through that, it seems there was a larger problem being dealt with and there was some discussion of details but not too much on this point (except one person not wanting cargo to be opinionated).

github-merge-queue bot pushed a commit that referenced this issue Dec 24, 2024
### What does this PR try to resolve?

This adds a special case for `package.{readme,license-file}`
to Git VCS status check.
If they were specified with paths outside the current package root,
but still under git workdir, Cargo checks git status of those files
to determine if they were dirty.

We don't need to take care of other fields with path values because

* `PathSource` only list files under the package root.
  Things like `target.path` works for `cargo build`, but won't be
  included in `.crate` file from `cargo publish`.
* The only exceptions are `package.readme`/`package.license-file`.
  Cargo would copy files over if they are outside package root.

### How should we test and review this PR?

While this doesn't fix ever case listed in #14967,
it at least fixes one of them.
@weihanglo
Copy link
Member Author

weihanglo commented Dec 28, 2024

Regarding potential solutions to fixing workspace inheritance, there are a couple of ideas

  1. If workspace Cargo.toml is dirty, we consider the member's Cargo.toml dirty.
    • May have lots of false positives.
    • Easy to implement and maintain.
  2. If workspace Cargo.toml is dirty, we check whether member's Cargot.toml has any inheritannce fields. If any, consider dirty.
    • Have less false positive. Irrelevant changes may still fail the dirtiness check.
    • May need to hardcode a list of inheritable fields. Easy to get out of sync.
  3. Swap the workspace Cargo.toml from worktree version to index version. Do a member manifest normalization against it. And then compare two normalized manifests
  4. Like the previous one, but do publish-normalization manifests for both worktree and index.
    • The most precise option.
    • Has the same drawback as the previous one.

github-merge-queue bot pushed a commit that referenced this issue Dec 31, 2024
### What does this PR try to resolve?

This adds a special case for checking source files are symlinks
and have been modified when under a VCS control

This is required because those paths may link to a file outside the
current package root, but still under the git workdir, affecting the
final packaged `.crate` file.

### How should we test and review this PR?

Pretty similar to #14966, as a part of #14967.

This may have potential performance issue. If a package contains
thousands of symlinks, Cargo will fire `git status` for each of them.
Not sure if we want to do anything proactively now.

The introduction of the `PathEntry` struct gives us more room for
storing file metadata to satisfiy use cases in the future. For
instances,

* Knowing a source file is a symlink and resolving it when packaging on
Windows
  * #5664
  * #14965
* Knowing more about a file's metadata (e.g. permission bits from Git)
  * #4413
  * #8006
* Provide richer information for `cargo package --list`, for example
JSON output mode
  * #11666
  * #13331
  * #13953
@weihanglo weihanglo self-assigned this Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-package S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

2 participants