-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlockCannotReturn when terminating a time-restricted process #108
Comments
Hi Christoph, here's what's happening:
I'm not sure what would be the best way to fix it but a suggestion addressing your example is in Chronology-Core-jar.89 Let me know what you think... Thanks for this delightful example ;) The original version:
|
Hi Christoph, |
Hi Jaromir, thanks a lot for the investigation! Both your fixes look good to me but the second one (Chronology-Core-jar.90) seems more simple as it avoids the complexity of testing process states. I have tested it in my image for one day and the number of unexpected BCRs has dropped significantly. :-) (Still, I keep seeing some rare BCRs with a different stack that unfortunately freeze my image so I cannot investigate them, but I don't even know whether they are related to By the way, initially I assumed that when the watchdog is preempted right before signaling the timeout, there could be another synchronization issue ... but this will never happen because it has a higher priority than theProcess. Could there be any other synchronization issues with this method? Right now I can't find any. :-) |
Hi Christoph,
It's a tricky neighborhood :) #signalException: may have unexpected side effects because it resumes the receiver process. If this process is waiting on a condition variable, e.g. on a semaphore or a delay, it will be taken out of this wait and will continue after signaling the exception. See these examples (edit: wrong examples; see the next post):
I'm not sure what should be the right expectation... to do what it currently does or continue waiting? What would you expect? Is this scenario something you use in the Trace debugger or the Simulation studio? |
Sorry, wrong examples in my previous post - they don't demonstrate the point. Try these instead:
Something doesn't feel right here... In this example the process won't leave the semaphore:
|
Hi Christoph, If you find the changsets correctly fixing the bug in #valueWithin:onTimeout:, could you please merge? I'd prefer Chronology-Core-jar.91 but Chronology-Core-jar.90 is also fine. (and please remove the remaining two from the Inbox) Thanks, |
Hi Jaromir, sorry for not replying earlier ...
Yes, that's probably cleaner! Nice find! I have loaded it into my image and will report after a few days ... (by the way, with Chronology-Core-jar.90, I still saw a - though much smaller - number of errors from this corner, but I unfortunately did not record them. Will do so when I see them again. Regarding your other examples: I agree with your conclusion that this is expected behavior. Just like |
Hi Jaromir, I have merged Chronology-Core-jar.91 (and Chronology-Core-jar.90 because you based jar.91 on its ancestry) to the trunk and added a test from my above example. By the way, the both are not fully equivalent - jar.90 did not pass my test because, surprise, Apart from that, I noticed some other strange thing: If you try to step over |
Expected: The stack of the process is fully unwound, and the process is terminated correctly.
Actual: The following debugger appears (BlockCannotReturn):
This is not a regression and failed in 5.3, too (but with a different error).
I have no idea yet why this happens, but it is blowing up every few minutes in my current multiprocessing-intensive project. Fix or workaround would be highly desired.
Cc'ing @isCzech in case this bug interests you :)
The text was updated successfully, but these errors were encountered: