-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests time out non-deterministically #909
Comments
Problem started with this PR: #899 although that never got merged. |
The first merged PR that suffers from the problem is the following. Going back up to ten PRs before that does not show the timeout problem. Before that was this PR: Before that was: However, there was also an unmerged PR that suffers from the problem:
Given that the unmerged PR is the first to run into timeouts, it seems that the issue is unrelated to any changes in Boogie. The PR before the unmerged problematic one is #898, but that itself has no issue, and neither does another PR after it, so that PR does not seem to be at fault. The issue seems to come from the image version upgrade from 20240624.1.0 to 20240630.1.0, since all runs with the problem are in 20240630.1.0 and none are on 20240624.1.0. |
One thing that's a shame is that the timeout comes from lit, not from the Boogie process. It would be good to configure timeouts at the Boogie level that are shorter than the lit timeout, so that when the timeout occurs, the Boogie process can give some feedback as to where it occurred. |
Desperate move but maybe upgrading dotnet will help: #909 Update: nope, timeouts still occur there |
The diff between those Ubuntu image versions is here, but it unfortunately gives me no ideas as to what might be going wrong. I can't imagine how any of those changes would have impacted anything. One interesting data point is that the normal runtime for the tests that are hanging is extremely short. Verification time for each is a few milliseconds, and the Z3 resource count is usually under 1000, and always under 3000. My best hypothesis is that this is a deadlock. Maybe the changes to the underlying system are somehow enough to make the non-determinism of concurrency more likely to go wrong? It could be worth seeing what happens if we try reverting concurrency-related PRs and seeing whether that makes any difference (not that we'd want to revert them permanently, because I think they all fix important things, but it might help us understand what's going on). The ones I see in the recent past include #841, #854, #888, and maybe #898? |
I agree. I increased the timeout by a huge amount and the timeouts still occurred. |
Different tests seems to fail non-deterministically:
The text was updated successfully, but these errors were encountered: