-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDB Checkpoint Issue #3678
Comments
theIDinside
added a commit
to theIDinside/rr
that referenced
this issue
Jan 15, 2024
Refactor so that marks_with_checkpoints is just changed in one place, not arbitrarily access it. Ref counts had the same changes in a previous commit. Fixes a bug for loaded persistent checkpoints where the re-created checkpoints did not get their reference counting correct. This closes rr-debugger#3678
theIDinside
added a commit
to theIDinside/rr
that referenced
this issue
Jun 21, 2024
- Moved into util.cc - Added forward_to to skip trace data to some arbitrary point in time Getters required to expose data We need to be able to expose this data so it can be serialized. Find original exe for ReplayTask Digs out original executable image that this task was forked from, or in the case of exec, exec'd on. This is required for persistent checkpointing, so that the names in the proc fs corresponds to a correct name at replay time (i.e. has the same behavior/looks the same in proc fs as a normal replay). The thread name is not what should be showing up in /proc/tid/comm, but the actual executable. So we need to be able to find this "original exe" of the task. Check if Event is checkpointable Required for the create checkpoints command, etc. to determine what events in the trace are checkpointable, when not having a live session. In future commits/PRs, remove the static function in ReplaySession.cc` that does the same thing and use this member function on Event instead. Additional proc fs query paths Gets additional proc fs paths for a task, in this case /mem. Required for persistent checkpointing to figure out on how to handle mappings and what to serialize (and what not to serialize). Lifted CloneCompletion out of Session The function extract_name will also be required for setting up syscall buffer stuff in coming commits. Getters/setters required for PCP Need to be able to set this data when restoring an address space. Persistent checkpointing Added persistent checkpoint schema for capnproto rr_pcp.capnp, as well a compile command for it in CMakeLists.txt, that works like the other one (rr_trace.capnp) CheckpointInfo and MarkData types works as intermediaries between a serialized checkpoint and a deserialized "live" one. MarkData is used for copying the contents of Mark, InternalMark, ProtoMark and it's various data into, for serialization as well when deserializing, to reconstruct those types. The reasoning for adding MarkData is to not intrude in Mark/InternalMark/ProtoMark interface and possibly break some guarantees or invariants they provide. If something goes wrong now, it's constrained only to persistent checkpointing not reconstituting a session properly. GDB spawned by RR now has 2 additional commands, write-checkpoints, which serializes any checkpoints set by the `checkpoint` command and load-checkpoints. Added the rr create-checkpoints command which create persistent checkpoints on a specified interval, which it attempts to honor as closely as possible. RerunCommand and ReplayCommand are now aware of PCPs. Replay sessions get spawned from persistent checkpoints if they exist on disk when using `-g <evt>` or when using `-f <pid>` and that "task" was created some time after a persistent checkpoint. Added the --ignore-pcp flag to these commands, which ignores pcps and spawns sessions normally. fixup for can_checkpoint_at Restored comments, that existed in static function in ReplaySession.cc Change all use of this to Event::can_checkpoint_at Removed static can_checkpoint_at in ReplaySession.cc Fix preferred include & unnecessary check for partial init Since checkpoints are partially initialized, checking that they are is pointless. Added cmake command looping over trace files per request by @khuey remove init check of member variables. Move extract_name from Session into util.h. Removed stream_util, moved contents to util.h make ignore-pcp not take up '-i' Moved responsibility of de/ser into FdTable and FileMonitor Deserializing and serializing an FdTable is now performed by the class itself instead of in a free function FileMonitor has a public member function that is used for serialization. Each derived type that requires special/additional logic, extends the virtual member function serialize_type. Remove skipMonitoringMappedFd not necessary for serialization, as FdTable is separately restored. Refactor task OS-name setting Task::copy_state sets the OS name of a task in the same fashion that persistent checkpointing sets name. Refactored this functionality into Task::set_name. Also removed the unnecessary `update_prname` from Task::copy_state. update_prname is not a "write to tracee"-operation but a "read from tracee"-operation; and since we already know what name we want to set Task::prname to, we skip this reading from the tracee in Task::copy_state and just set it to the parameter passed in to Task::set_name Add const qualifier Fixes rr-debugger#3678 Refactor so that marks_with_checkpoints is just changed in one place, not arbitrarily access it. Ref counts had the same changes in a previous commit. Fixes a bug for loaded persistent checkpoints where the re-created checkpoints did not get their reference counting correct. This closes rr-debugger#3678 Changes required to rebase
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There's currently an issue with GDB checkpoints making them behave in (probably) unintended ways.
Here's a quick run down of the behavior:
checkpoint
command)During the continue until next stop, the internal checkpoint at (T-n) might have been cleaned up & removed by the supervisor. If there's an internal checkpoint before it, the GDB checkpoint will restart from that instead (and if there is none before
T-n
, essentially restarting that checkpoint amounts to restarting the replay from the beginning).This is probably not the intended behavior.
This issue will be fixed by the
Persistent Checkpoint
PR, because that PR requires this "searching backwards for internal checkpoints"-functionality (and has also refactored out the checkpoint refcount management, which is what keeps "internal checkpoints" alive).So this issue will be closed by #3406 once it's done.
I came across this bug when finishing up that PR. So once I can get an "ok" that we can solve it by pulling in that PR when it's done, I'll move forward with updating the PR (rebasing onto master) - otherwise I'll have to fix this issue first (which is fine too).
The text was updated successfully, but these errors were encountered: