-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize dep-info for all dependencies #7034
Utilize dep-info for all dependencies #7034
Conversation
See #7030 (comment) for context here; we shouldn't land it before that PR but the analysis of whether this is the right change or something more limited is necessary can be done in parallel I think. |
Can you also delete the first two sentences in the "Special considerations" section of the documentation at the top of the file (the ones that talk about registry dependencies not using mtimes)? |
Ok I think this probably isn't going to have any adverse effects really, it does look very local in the code about simply using the dep-info file in more locations. I am worried, however, as to the performance impact of this change. Especially on slower filesystems this is likely to make Cargo issue an order of magnitude more stat calls than before. Previously you're only stat'ing files that you are locally modifying, but all your upstream dependencies (on the order of hundreds) are largely ignored. Now, however, every file that could be part of the build is being stat'd. This is probably fine for rustc where the vast majority of files are all local anyway, but I'd be worried about builds like Servo or Cargo itself where the vast majority of files come from crates.io or git dependencies. Would it be possible to do some analysis to see try to evaluate the performance impact and or stat-system-call impact this PR has? |
Now that sysroot dependencies are encoded, we want to know about them regardless of whether the crate itself is unlikely to change contents.
38d4132
to
3edef32
Compare
Building cargo itself, on macOS (I won't have access to a linux machine until approximately Wednesday next week). Looks like this is behaving relatively equally performance wise (and only ~500 more stat calls than before, ish on a no-op build of Cargo). If there's some other scenario that'd be good to test then I'm happy to do so, though I might wait till I have access to linux as I'm more familiar with the tooling there. master: cargo check (full build) full build syscall stats:
cargo check (no-op build) no-op build stats:
This PR: cargo check (full build) full build stats:
cargo check (no-op build) no-op build stats:
|
I haven't looked into the test failures deeply but they seem rather odd, perhaps exposing prior bugs? Not sure. |
This also causes some sporadic failures on HFS. I'd like to see that resolved before merging this. Some that seem to fail are:
I can try to help look at them soonish. |
Hm ok, I'm not necessarily concerned too much about a particular project but moreso how this scales. If we're required to always stat everything that your project depends on then that could get quite unwieldy for larger projects while being invisible effectively in smaller projects (like Cargo) |
Even on a project of Servo's size it's only ~5000 additional syscalls on a no-op debug build (out of 60,000 syscalls total). That leads me to believe that there's not much reason to be concerned here -- the time impact and scale of this change is fairly minimal. It's definitely not an order of magnitude increase. Is there some concrete number/project I can gather? I do think this is adding work but for the most part not that much -- the number of dependencies and as such files that need to be stat'd is I'd expect smaller by about an order of magnitude than the number of local source files that we need to stat anyway, so this feels like not too big a concern to me. |
I can't reproduce the failures on HFS with |
Oh, crap, I'm sorry. I ran without the CI environment variable, I forgot. Ignore my note about it, then. And I've been intending to get back to looking at this, I just haven't made the time. |
I've been digging into this and rust-lang/rust#61727 for a while today, but I've run out of time and energy. There's a bug with #7030 in that it doesn't handle host-vs-target directories correctly. I hacked together a fix, but it ended up changing quite a lot. The target directory handling code is too brittle and the naming conventions are a bit confusing. I'm not sure what ultimately will be the best way to fix it. I'm tempted to do something more radical to make it easier to work with, but I'm not sure. If we want to mitigate the concern about performance, the dep-info parser could skip package-relative files for non-path dependencies. Perhaps something like this: diff --git a/src/cargo/core/compiler/fingerprint.rs b/src/cargo/core/compiler/fingerprint.rs
index 1804be9a..9eabb106 100644
--- a/src/cargo/core/compiler/fingerprint.rs
+++ b/src/cargo/core/compiler/fingerprint.rs
@@ -1415,11 +1415,11 @@ pub fn parse_dep_info(
}
})
.collect::<Result<Vec<_>, _>>()?;
- if paths.is_empty() {
- Ok(None)
- } else {
- Ok(Some(paths))
- }
+ // if paths.is_empty() {
+ // Ok(None)
+ // } else {
+ Ok(Some(paths))
+ // }
}
fn pkg_fingerprint(bcx: &BuildContext<'_, '_>, pkg: &Package) -> CargoResult<String> {
@@ -1541,6 +1541,7 @@ pub fn translate_dep_info(
rustc_cwd: &Path,
pkg_root: &Path,
target_root: &Path,
+ allow_package: bool,
) -> CargoResult<()> {
let target = parse_rustc_dep_info(rustc_dep_info)?;
let deps = &target
@@ -1552,6 +1553,10 @@ pub fn translate_dep_info(
for file in deps {
let file = rustc_cwd.join(file);
let (ty, path) = if let Ok(stripped) = file.strip_prefix(pkg_root) {
+ if !allow_package {
+ // eprintln!("skipping {:?} {:?}", pkg_root, stripped);
+ continue;
+ }
(DepInfoPathType::PackageRootRelative, stripped)
} else if let Ok(stripped) = file.strip_prefix(target_root) {
(DepInfoPathType::TargetRootRelative, stripped)
diff --git a/src/cargo/core/compiler/mod.rs b/src/cargo/core/compiler/mod.rs
index 628ac8bb..9d85874a 100644
--- a/src/cargo/core/compiler/mod.rs
+++ b/src/cargo/core/compiler/mod.rs
@@ -321,6 +321,7 @@ fn rustc<'a, 'cfg>(
&cwd,
&pkg_root,
&root_output,
+ current_id.source_id().is_path()
)
.chain_err(|| {
internal(format!( I don't know why empty dep-info files are treated as "missing". It was changed in #4788, but there aren't any notes as to why. It would still end up |
It's not clear to me what you mean by this -- do you have a test case? That code (IIRC) didn't actually root paths at the target directory, so I'm somewhat surprised that it is buggy in that way. The target_root in code is I believe the workspace root, though the naming could be better. |
Using it in rust-lang/rust with rust-lang/rust#61727. Or, make a workspace with a member that has a dependency on serde_derive and uses it, and build with I also think there is some issue with it being in a directory inside a workspace vs not (since this only happens when the project is a subdirectory in a workspace). I think the crux of the problem is My hack involved changing it to return Let me know what you think. I haven't quite finished reviewing all the different relative paths. I probably won't have much time to work on it this weekend. Here is a diff of my hack. IIRC, this passes all tests (after removing the debugging prints), and fixes the rebuild issue. I don't think it should be used as-is, though. I know Alex doesn't like the use of |
I think this bit is what that PR tries to fix, but I didn't consider that I'm obviously less knowledgeable in this area -- and not sure what the next step should be. If you have suggestions I'm happy to implement them or try to make progress; otherwise I consider this somewhat blocked on a "what to do next." I can attempt to fix the |
Closing in favor of b1b9b79. |
Fix some issues with absolute paths in dep-info files. There were some issues with how #7030 was handling translating paths in dep-info files. The main consequence is that when coupled with rust-lang/rust#61727 (which has not yet merged), the fingerprint would fail and be considered dirty when it should be fresh. It was joining [`target_root`](https://github.com/rust-lang/cargo/blame/1140c527c4c3b3e2533b9771d67f88509ef7fc16/src/cargo/core/compiler/fingerprint.rs#L1352-L1360) which had 3 different values, but stripping [only one](https://github.com/rust-lang/cargo/blame/1140c527c4c3b3e2533b9771d67f88509ef7fc16/src/cargo/core/compiler/mod.rs#L323). This means for different scenarios (like using `--target`), it was creating incorrect paths in some cases. For example a target having a proc-macro dependency which would be in the host directory. The solution here is to always use CARGO_TARGET_DIR as the base that all relative paths are checked against. This should handle all host/target differences. The tests are marked with `#[ignore]` because 61727 has not yet merged. This includes a second commit (which I can open as a separate PR if needed) which is an alternate solution to #7034. It adds dep-info tracking for registry dependencies. However, it tries to limit which kinds of paths it tracks. It will not track package-relative paths (like source files). It also adds an mtime cache to significantly reduce the number of stat calls (because every dependency was stating every file in sysroot). Finally, I've run some tests using this PR with 61727 in rust-lang/rust. I can confirm that a second build is fresh (it doesn't erroneously rebuild). I can also confirm that the problem in rust-lang/rust#59105 has *finally* been fixed! My confidence in all this is pretty low, but I've been unable to think of any specific ways to make it fail. If anyone has ideas on how to better test this, let me know. Also, feel free to ask questions since I've been thinking about this a lot for the past few weeks, and there is quite a bit of subtle stuff going on.
Now that sysroot dependencies are encoded, we want to know about them
regardless of whether the crate itself is unlikely to change contents.
r? @alexcrichton