From 4adec90409b634ab26a0c1b387e0b7274a42b0e2 Mon Sep 17 00:00:00 2001 From: Simon Farre Date: Tue, 29 Nov 2022 11:22:28 +0100 Subject: [PATCH] Functionality that will be shared, moved from TraceStream.cc - Moved into util.cc - Added forward_to to skip trace data to some arbitrary point in time Getters required to expose data We need to be able to expose this data so it can be serialized. Find original exe for ReplayTask Digs out original executable image that this task was forked from, or in the case of exec, exec'd on. This is required for persistent checkpointing, so that the names in the proc fs corresponds to a correct name at replay time (i.e. has the same behavior/looks the same in proc fs as a normal replay). The thread name is not what should be showing up in /proc/tid/comm, but the actual executable. So we need to be able to find this "original exe" of the task. Check if Event is checkpointable Required for the create checkpoints command, etc. to determine what events in the trace are checkpointable, when not having a live session. In future commits/PRs, remove the static function in ReplaySession.cc` that does the same thing and use this member function on Event instead. Additional proc fs query paths Gets additional proc fs paths for a task, in this case /mem. Required for persistent checkpointing to figure out on how to handle mappings and what to serialize (and what not to serialize). Lifted CloneCompletion out of Session The function extract_name will also be required for setting up syscall buffer stuff in coming commits. Getters/setters required for PCP Need to be able to set this data when restoring an address space. Persistent checkpointing Added persistent checkpoint schema for capnproto rr_pcp.capnp, as well a compile command for it in CMakeLists.txt, that works like the other one (rr_trace.capnp) CheckpointInfo and MarkData types works as intermediaries between a serialized checkpoint and a deserialized "live" one. MarkData is used for copying the contents of Mark, InternalMark, ProtoMark and it's various data into, for serialization as well when deserializing, to reconstruct those types. The reasoning for adding MarkData is to not intrude in Mark/InternalMark/ProtoMark interface and possibly break some guarantees or invariants they provide. If something goes wrong now, it's constrained only to persistent checkpointing not reconstituting a session properly. GDB spawned by RR now has 2 additional commands, write-checkpoints, which serializes any checkpoints set by the `checkpoint` command and load-checkpoints. Added the rr create-checkpoints command which create persistent checkpoints on a specified interval, which it attempts to honor as closely as possible. RerunCommand and ReplayCommand are now aware of PCPs. Replay sessions get spawned from persistent checkpoints if they exist on disk when using `-g ` or when using `-f ` and that "task" was created some time after a persistent checkpoint. Added the --ignore-pcp flag to these commands, which ignores pcps and spawns sessions normally. fixup for can_checkpoint_at Restored comments, that existed in static function in ReplaySession.cc Change all use of this to Event::can_checkpoint_at Removed static can_checkpoint_at in ReplaySession.cc Fix preferred include & unnecessary check for partial init Since checkpoints are partially initialized, checking that they are is pointless. Added cmake command looping over trace files per request by @khuey remove init check of member variables. Move extract_name from Session into util.h. Removed stream_util, moved contents to util.h make ignore-pcp not take up '-i' Moved responsibility of de/ser into FdTable and FileMonitor Deserializing and serializing an FdTable is now performed by the class itself instead of in a free function FileMonitor has a public member function that is used for serialization. Each derived type that requires special/additional logic, extends the virtual member function serialize_type. Remove skipMonitoringMappedFd not necessary for serialization, as FdTable is separately restored. Refactor task OS-name setting Task::copy_state sets the OS name of a task in the same fashion that persistent checkpointing sets name. Refactored this functionality into Task::set_name. Also removed the unnecessary `update_prname` from Task::copy_state. update_prname is not a "write to tracee"-operation but a "read from tracee"-operation; and since we already know what name we want to set Task::prname to, we skip this reading from the tracee in Task::copy_state and just set it to the parameter passed in to Task::set_name Add const qualifier Fixes #3678 Refactor so that marks_with_checkpoints is just changed in one place, not arbitrarily access it. Ref counts had the same changes in a previous commit. Fixes a bug for loaded persistent checkpoints where the re-created checkpoints did not get their reference counting correct. This closes #3678 Changes required to rebase --- CMakeLists.txt | 35 +- src/AddressSpace.cc | 5 + src/AddressSpace.h | 20 + src/BpfMapMonitor.h | 8 +- src/CheckpointInfo.cc | 230 ++++++++++++ src/CheckpointInfo.h | 127 +++++++ src/CreateCheckpointsCommand.cc | 183 +++++++++ src/CreateCheckpointsCommand.h | 37 ++ src/DebuggerExtensionCommand.cc | 100 +++-- src/DebuggerExtensionCommand.h | 3 + src/Event.cc | 20 + src/Event.h | 6 + src/FdTable.cc | 95 +++++ src/FdTable.h | 4 + src/FileMonitor.cc | 6 + src/FileMonitor.h | 10 +- src/GdbServer.cc | 100 +++-- src/GdbServer.h | 47 ++- src/GdbServerConnection.cc | 5 + src/GdbServerConnection.h | 2 + src/MagicSaveDataMonitor.h | 2 +- src/MmappedFileMonitor.cc | 9 + src/MmappedFileMonitor.h | 4 +- src/NonvirtualPerfCounterMonitor.h | 3 +- src/ODirectFileMonitor.h | 2 +- src/PersistentCheckpointing.cc | 570 +++++++++++++++++++++++++++++ src/PersistentCheckpointing.h | 81 ++++ src/PidFdMonitor.h | 2 +- src/PreserveFileMonitor.h | 2 +- src/ProcFdDirMonitor.cc | 8 + src/ProcFdDirMonitor.h | 4 +- src/ProcMemMonitor.cc | 10 + src/ProcMemMonitor.h | 4 +- src/ProcStatMonitor.cc | 4 + src/ProcStatMonitor.h | 4 +- src/RRPageMonitor.h | 2 +- src/ReplayCommand.cc | 56 ++- src/ReplaySession.cc | 363 ++++++++++++++++-- src/ReplaySession.h | 26 ++ src/ReplayTask.cc | 23 ++ src/ReplayTask.h | 3 + src/ReplayTimeline.cc | 105 +++++- src/ReplayTimeline.h | 80 +++- src/RerunCommand.cc | 29 +- src/Session.cc | 43 --- src/Session.h | 15 +- src/StdioMonitor.cc | 4 + src/StdioMonitor.h | 4 +- src/SysCpuMonitor.h | 2 +- src/Task.cc | 6 + src/Task.h | 9 + src/TraceStream.cc | 148 +++----- src/TraceStream.h | 7 + src/VirtualPerfCounterMonitor.cc | 4 + src/VirtualPerfCounterMonitor.h | 3 +- src/rr_pcp.capnp | 195 ++++++++++ src/util.cc | 166 +++++++++ src/util.h | 31 ++ 58 files changed, 2799 insertions(+), 277 deletions(-) create mode 100644 src/CheckpointInfo.cc create mode 100644 src/CheckpointInfo.h create mode 100644 src/CreateCheckpointsCommand.cc create mode 100644 src/CreateCheckpointsCommand.h create mode 100644 src/PersistentCheckpointing.cc create mode 100644 src/PersistentCheckpointing.h create mode 100644 src/rr_pcp.capnp diff --git a/CMakeLists.txt b/CMakeLists.txt index 0327d605641..bb11a505d86 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -539,17 +539,26 @@ endforeach(generated_file) add_custom_target(Generated DEPENDS ${GENERATED_FILES}) -add_custom_command(OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++" - "${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.h" - COMMAND capnp compile - "--src-prefix=${CMAKE_CURRENT_SOURCE_DIR}/src" - "-oc++:${CMAKE_CURRENT_BINARY_DIR}" - "${CMAKE_CURRENT_SOURCE_DIR}/src/rr_trace.capnp" - DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/src/rr_trace.capnp") -set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++" - PROPERTIES GENERATED true) -set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.h" - PROPERTIES GENERATED true HEADER_FILE_ONLY true) + +set(CAPNP_FILES + rr_trace + rr_pcp +) + +# Compile capnproto files +foreach(capnp_file ${CAPNP_FILES}) +add_custom_command(OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.c++" + "${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.h" + COMMAND capnp compile + "--src-prefix=${CMAKE_CURRENT_SOURCE_DIR}/src" + "-oc++:${CMAKE_CURRENT_BINARY_DIR}" + "${CMAKE_CURRENT_SOURCE_DIR}/src/${capnp_file}.capnp" + DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/src/${capnp_file}.capnp") + set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.c++" + PROPERTIES GENERATED true) + set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.h" + PROPERTIES GENERATED true HEADER_FILE_ONLY true) +endforeach() if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64") set(BLAKE_ARCH_DIR third-party/blake2/neon) @@ -561,11 +570,13 @@ set(RR_SOURCES src/AddressSpace.cc src/AutoRemoteSyscalls.cc src/BuildidCommand.cc + src/CheckpointInfo.cc src/Command.cc src/CompressedReader.cc src/CompressedWriter.cc src/CPUFeaturesCommand.cc src/CPUIDBugDetector.cc + src/CreateCheckpointsCommand.cc src/DiversionSession.cc src/DumpCommand.cc src/Dwarf.cc @@ -602,6 +613,7 @@ set(RR_SOURCES src/MvCommand.cc src/PackCommand.cc src/PerfCounters.cc + src/PersistentCheckpointing.cc src/PidFdMonitor.cc src/processor_trace_check.cc src/ProcFdDirMonitor.cc @@ -640,6 +652,7 @@ set(RR_SOURCES src/WaitManager.cc src/WaitStatus.cc ${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++ + ${CMAKE_CURRENT_BINARY_DIR}/rr_pcp.capnp.c++ ${BLAKE_ARCH_DIR}/blake2b.c ) diff --git a/src/AddressSpace.cc b/src/AddressSpace.cc index 86b00445190..b1263776367 100644 --- a/src/AddressSpace.cc +++ b/src/AddressSpace.cc @@ -550,6 +550,11 @@ void AddressSpace::save_auxv(Task* t) { save_interpreter_base(t, saved_auxv()); } +void AddressSpace::restore_auxv(Task* t, std::vector&& auxv) { + saved_auxv_ = std::move(auxv); + save_interpreter_base(t, saved_auxv()); +} + void AddressSpace::save_interpreter_base(Task* t, std::vector auxv) { saved_interpreter_base_ = read_interpreter_base(auxv); save_ld_path(t, saved_interpreter_base()); diff --git a/src/AddressSpace.h b/src/AddressSpace.h index 1af47e580ba..4bdf097db29 100644 --- a/src/AddressSpace.h +++ b/src/AddressSpace.h @@ -660,6 +660,14 @@ class AddressSpace : public HasTaskSet { * Dies if no shm size is registered for the address. */ size_t get_shm_size(remote_ptr addr) { return shm_sizes[addr]; } + + /** + * Check if `map` is shared memory + */ + bool has_shm_at(const KernelMapping& map) const { + return shm_sizes.find(map.start()) != std::cend(shm_sizes); + } + void remove_shm_size(remote_ptr addr) { shm_sizes.erase(addr); } /** @@ -793,6 +801,9 @@ class AddressSpace : public HasTaskSet { const std::vector& saved_auxv() { return saved_auxv_; } void save_auxv(Task* t); + /* Used when restoring persistent checkpoints. */ + void restore_auxv(Task* t, std::vector&& auxv); + remote_ptr saved_interpreter_base() { return saved_interpreter_base_; } void save_interpreter_base(Task* t, std::vector auxv); @@ -871,6 +882,15 @@ class AddressSpace : public HasTaskSet { bool legacy_breakpoint_mode() { return stopping_breakpoint_table_ != nullptr; } remote_code_ptr do_breakpoint_fault_addr() { return do_breakpoint_fault_addr_; } + + void set_breakpoint_fault_addr(remote_code_ptr addr) { + do_breakpoint_fault_addr_ = addr; + } + + void set_uses_syscall_buffer(bool uses_syscall_buffer = true) { + syscallbuf_enabled_ = uses_syscall_buffer; + } + remote_code_ptr stopping_breakpoint_table() { return stopping_breakpoint_table_; } int stopping_breakpoint_table_entry_size() { return stopping_breakpoint_table_entry_size_; } diff --git a/src/BpfMapMonitor.h b/src/BpfMapMonitor.h index 9ee1e1c67f5..d10f6128c4a 100644 --- a/src/BpfMapMonitor.h +++ b/src/BpfMapMonitor.h @@ -14,12 +14,18 @@ class BpfMapMonitor : public FileMonitor { public: BpfMapMonitor(uint64_t key_size, uint64_t value_size) : key_size_(key_size), value_size_(value_size) {} - virtual Type type() override { return BpfMap; } + virtual Type type() const override { return BpfMap; } uint64_t key_size() const { return key_size_; } uint64_t value_size() const { return value_size_; } private: + virtual void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override { + auto bpf = builder.initBpf(); + bpf.setKeySize(key_size_); + bpf.setValueSize(value_size_); + } + uint64_t key_size_; uint64_t value_size_; }; diff --git a/src/CheckpointInfo.cc b/src/CheckpointInfo.cc new file mode 100644 index 00000000000..dde99b3e495 --- /dev/null +++ b/src/CheckpointInfo.cc @@ -0,0 +1,230 @@ +#include "CheckpointInfo.h" +#include "GdbServerConnection.h" +#include "ReplayTimeline.h" +#include "ScopedFd.h" +#include +#include +#include +#include +#include + +namespace rr { + +MarkData::MarkData(const ReplayTimeline::Mark& m) + : time(m.get_key().trace_time), + ticks(m.get_key().ticks), + step_key(m.get_key().step_key.as_int()), + ticks_at_event_start(m.get_internal()->ticks_at_event_start), + regs(m.regs()), + extra_regs(m.extra_regs()), + return_addresses(m.get_internal()->proto.return_addresses), + singlestep_to_next_mark_no_signal( + m.get_internal()->singlestep_to_next_mark_no_signal) {} + +MarkData::MarkData(rr::pcp::MarkData::Reader reader, SupportedArch arch, + const CPUIDRecords& cpuid_recs) + : time(reader.getTime()), + ticks(reader.getTicks()), + step_key(reader.getStepKey()), + ticks_at_event_start(reader.getTicksAtEventStart()), + regs(), + extra_regs(), + return_addresses(), + singlestep_to_next_mark_no_signal( + reader.getSinglestepToNextMarkNoSignal()) { + regs.set_arch(arch); + regs.set_from_trace(arch, reader.getRegs().getRaw().begin(), + reader.getRegs().getRaw().size()); + auto eregs = reader.getExtraRegs().getRaw(); + set_extra_regs_from_raw(arch, cpuid_recs, eregs, extra_regs); + auto i = 0; + for (auto rs : reader.getReturnAddresses()) { + return_addresses.addresses[i++] = rs; + } +} + +std::vector get_checkpoint_infos(const std::string& trace_dir, SupportedArch arch, const CPUIDRecords& cpuid_recs) { + // the trace's main checkpoint file, containing the list of all persistent + // checkpoints. + const auto path = checkpoints_index_file(trace_dir); + ScopedFd fd(path.c_str(), O_RDONLY); + std::vector checkpoints; + if (!fd.is_open()) { + return checkpoints; + } + + capnp::PackedFdMessageReader reader(fd); + auto checkpointsInfoReader = reader.getRoot(); + auto cps = checkpointsInfoReader.getCheckpoints(); + for (const auto& cp : cps) { + auto info = CheckpointInfo{ cp, arch, cpuid_recs }; + if (info.exists_on_disk()) { + checkpoints.push_back(info); + } + } + std::sort(checkpoints.begin(), checkpoints.end(), + [](CheckpointInfo& a, CheckpointInfo& b) { + return a.clone_data.time <= b.clone_data.time; + }); + return checkpoints; +} + +bool CheckpointInfo::exists_on_disk() const { + struct stat buf; + return stat(capnp_cp_file.c_str(), &buf) == 0 && + stat((capnp_cp_file + std::to_string(clone_data.time)).c_str(), &buf) == 0; +} + +CheckpointInfo::CheckpointInfo(const Checkpoint& c) + : unique_id(CheckpointInfo::generate_unique_id(c.unique_id)), + last_continue_task(c.last_continue_task), + where(c.where), + clone_data(c.mark), + non_explicit_mark_data(nullptr) + { + DEBUG_ASSERT(c.is_explicit == Checkpoint::EXPLICIT && c.mark.has_rr_checkpoint()); + // can't assert before ctor, set these values here. + next_serial = c.mark.get_checkpoint()->current_task_serial(); + stats = c.mark.get_checkpoint()->statistics(); + LOG(debug) << "checkpoint clone at " << clone_data.time + << "; GDB checkpoint at " << clone_data.time; + capnp_cp_file = c.mark.get_checkpoint()->trace_reader().dir() + + "/checkpoint-" + std::to_string(unique_id); +} + +CheckpointInfo::CheckpointInfo(ExtendedTaskId last_continue, + const ReplayTimeline::Mark& mark_with_checkpoint) + : unique_id(CheckpointInfo::generate_unique_id()), + last_continue_task(last_continue), + where("Unknown"), + next_serial(mark_with_checkpoint.get_checkpoint()->current_task_serial()), + clone_data(mark_with_checkpoint), + non_explicit_mark_data(nullptr), + stats(mark_with_checkpoint.get_checkpoint()->statistics()) +{ + LOG(debug) << "checkpoint clone at " << clone_data.time + << "; GDB checkpoint at " << clone_data.time; + capnp_cp_file = mark_with_checkpoint.get_checkpoint()->trace_reader().dir() + + "/checkpoint-" + std::to_string(unique_id); +} + +CheckpointInfo::CheckpointInfo(const Checkpoint& non_explicit_cp, + const ReplayTimeline::Mark& mark_with_clone) + : unique_id(CheckpointInfo::generate_unique_id(non_explicit_cp.unique_id)), + last_continue_task(non_explicit_cp.last_continue_task), + where(non_explicit_cp.where), + next_serial( + mark_with_clone.get_checkpoint()->current_task_serial()), + clone_data(mark_with_clone), + non_explicit_mark_data(new MarkData{ non_explicit_cp.mark }), + stats(mark_with_clone.get_checkpoint()->statistics()) { + DEBUG_ASSERT(non_explicit_cp.is_explicit == Checkpoint::NOT_EXPLICIT && + !non_explicit_cp.mark.has_rr_checkpoint() && + "Constructor meant for non explicit checkpoints"); + // XXX we give this checkpoint the id (and name/path) of the actual cloned session + // data, so that multiple non explicit checkpoints later on, can reference the + // same clone data (not yet implemented) + LOG(debug) << "checkpoint clone at " << clone_data.time << "; GDB checkpoint at " << non_explicit_mark_data->time; + capnp_cp_file = mark_with_clone.get_checkpoint()->trace_reader().dir() + + "/checkpoint-" + std::to_string(unique_id); +} + +CheckpointInfo::CheckpointInfo(rr::pcp::CheckpointInfo::Reader reader, + SupportedArch arch, + const CPUIDRecords& cpuid_recs) + : capnp_cp_file(data_to_str(reader.getCloneCompletionFile())), + unique_id(reader.getId()), + where(data_to_str(reader.getWhere())), + next_serial(reader.getNextSerial()), + clone_data(reader.isExplicit() ? reader.getExplicit() + : reader.getNonExplicit().getCloneMark(), + arch, cpuid_recs), + non_explicit_mark_data( + reader.isNonExplicit() + ? new MarkData{ reader.getNonExplicit().getCheckpointMark(), arch, + cpuid_recs } + : nullptr), + stats() { + auto t = reader.getLastContinueTask(); + last_continue_task = ExtendedTaskId{{t.getGroupId(), t.getGroupSerial()}, {t.getTaskId(), t.getTaskSerial()}}; + auto s = reader.getStatistics(); + stats.bytes_written = s.getBytesWritten(); + stats.syscalls_performed = s.getSyscallsPerformed(); + stats.ticks_processed = s.getTicksProcessed(); +} + +void CheckpointInfo::delete_from_disk() { + const auto remove_file = [](auto path_data) { + const auto path = data_to_str(path_data); + if (remove(path.c_str()) != 0) { + LOG(error) << "Failed to remove " << path; + } + }; + ScopedFd fd(capnp_cp_file.c_str(), O_RDONLY); + capnp::PackedFdMessageReader datum(fd); + pcp::CloneCompletionInfo::Reader cc_reader = + datum.getRoot(); + const auto addr_spaces = cc_reader.getAddressSpaces(); + for (const auto& as : addr_spaces) { + const auto mappings_data = as.getProcessSpace().getVirtualAddressSpace(); + for (const auto& m : mappings_data) { + switch (m.getMapType().which()) { + case pcp::KernelMapping::MapType::FILE: + remove_file(m.getMapType().getFile().getContentsPath()); + break; + case pcp::KernelMapping::MapType::SHARED_ANON: + remove_file(m.getMapType().getSharedAnon().getContentsPath()); + break; + case pcp::KernelMapping::MapType::PRIVATE_ANON: + remove_file(m.getMapType().getPrivateAnon().getContentsPath()); + break; + case pcp::KernelMapping::MapType::GUARD_SEGMENT: + break; + case pcp::KernelMapping::MapType::SYSCALL_BUFFER: + remove_file(m.getMapType().getSyscallBuffer().getContentsPath()); + break; + case pcp::KernelMapping::MapType::RR_PAGE: + remove_file(m.getMapType().getRrPage().getContentsPath()); + break; + } + } + } + + remove(capnp_cp_file.c_str()); + remove(data_directory().c_str()); + if (exists_on_disk()) { + LOG(error) << "Couldn't remove persistent checkpoint data (or directory)"; + } +} + +ScopedFd CheckpointInfo::open_for_read() const { + DEBUG_ASSERT(exists_on_disk() && "This checkpoint has not been serialized; or the index file has been removed."); + auto file = ScopedFd(capnp_cp_file.c_str(), O_RDONLY); + if (!file.is_open()) FATAL() << "Couldn't open checkpoint data " << file; + return file; +} + +ScopedFd CheckpointInfo::open_for_write() const { + DEBUG_ASSERT(!exists_on_disk() && "Already serialized checkpoints shouldn't be re-written"); + auto file = ScopedFd(capnp_cp_file.c_str(), O_EXCL | O_CREAT | O_RDWR, 0700); + if (!file.is_open()) FATAL() << "Couldn't open checkpoint file for writing " << file; + return file; +} + +std::string CheckpointInfo::data_directory() const { + return capnp_cp_file + std::to_string(clone_data.time); +} + +/*static*/ size_t CheckpointInfo::generate_unique_id(size_t id) { + // if we haven't been set already, generate a unique "random" id + if (id == 0) { + timeval t; + gettimeofday(&t, nullptr); + auto cp_id = (t.tv_sec * 1000 + t.tv_usec / 1000); + return cp_id; + } else { + return id; + } + } + +} // namespace rr \ No newline at end of file diff --git a/src/CheckpointInfo.h b/src/CheckpointInfo.h new file mode 100644 index 00000000000..5001322de04 --- /dev/null +++ b/src/CheckpointInfo.h @@ -0,0 +1,127 @@ +#pragma once + +#include "ExtraRegisters.h" +#include "GdbServer.h" +#include "GdbServerConnection.h" +#include "ReturnAddressList.h" +#include "rr_pcp.capnp.h" +#include "util.h" +#include + +namespace rr { + +using CPUIDRecords = std::vector; + +/** + * CheckpointInfo and MarkData are intermediary types between de/serialization + * of checkpoints and marks. These types are added to not intrude in Checkpoint, + * Mark, InternalMarks, ProtoMark etc, to make sure that the implementation of + * persistent checkpoints do not break any guarantees or invariants provided by + * those types in normal record/replay. + */ + +/** + * `MarkData` flattens that "hierarchy" representing `Mark`, `InternalMark` and + * `ProtoMark` required for de/serialization. When deserializing this hierarchy + * is rebuilt from `MarkData` + */ +struct MarkData { + // Constructor when serializing + MarkData(const ReplayTimeline::Mark& m); + // Constructor when de-serializing + MarkData(rr::pcp::MarkData::Reader reader, SupportedArch arch, + const CPUIDRecords& cpuid_recs); + + FrameTime time; + Ticks ticks; + int step_key; + Ticks ticks_at_event_start; + Registers regs; + ExtraRegisters extra_regs; + ReturnAddressList return_addresses; + bool singlestep_to_next_mark_no_signal; +}; + +class CheckpointInfo { +public: + /** + * For `GDBServer` users of explicit checkpoints. + */ + CheckpointInfo(const Checkpoint& checkpoint); + + /** + * For `GDBServer` users where a non explicit checkpoint was set. + * `mark_with_clone` is the mark which holds the actual checkpoint / clone, + * which is some arbitrary event time before actual GDB checkpoint. + */ + CheckpointInfo(const Checkpoint& checkpoint, + const ReplayTimeline::Mark& mark_with_clone); + + /* For `CreateCheckpointsCommand` users (rr create-checkpoints command) */ + CheckpointInfo(ExtendedTaskId last_continue_task, + const ReplayTimeline::Mark& mark_with_checkpoint); + // When deserializing from capnproto stream + CheckpointInfo(rr::pcp::CheckpointInfo::Reader reader, SupportedArch arch, + const CPUIDRecords& cpuid_recs); + + bool exists_on_disk() const; + void delete_from_disk(); + + ScopedFd open_for_read() const; + ScopedFd open_for_write() const; + + /* Returns directory where the checkpoints memory mappings gets written to */ + std::string data_directory() const; + + /** + * Returns event time for this checkpoint + */ + FrameTime event_time() const { return clone_data.time; } + + static size_t generate_unique_id(size_t id = 0); + + friend bool operator==(const CheckpointInfo& lhs, const CheckpointInfo& rhs) { + return lhs.capnp_cp_file == rhs.capnp_cp_file; + } + + bool is_explicit() const { return non_explicit_mark_data == nullptr; } + + // Path to file containing filled out capnproto schema for this checkpoint + std::string capnp_cp_file; + size_t unique_id; + ExtendedTaskId last_continue_task; + std::string where; + uint32_t next_serial; + // MarkData collected from a Mark with a clone (either an explicit checkpoint, + // or the first found clone before a non-explicit checkpoint) + MarkData clone_data; + // (optional) MarkData collected from a Mark without a clone (in the case of non explicit checkpoints) + std::shared_ptr non_explicit_mark_data; + Session::Statistics stats; +}; + +/** + * Returns the path of checkpoint index file, given the dir `trace_dir` + */ +std::string checkpoints_index_file(const std::string& trace_dir); + +/** + * Retrieve list of persistent checkpoints in `trace_dir` sorted in ascending + * order by event time. + */ +std::vector get_checkpoint_infos( + const std::string& trace_dir, SupportedArch arch, + const CPUIDRecords& cpuid_recs); + +/** + * Updates the index for serialized checkpoints on disk to contain the + * `checkpoints`. Removes any checkpoints on disk, not found in `checkpoints`. + * One can clear all persistent checkpoints on disk by calling this with an + * empty `checkpoints` + */ +void update_persistent_checkpoint_index( + const std::string& trace_dir, SupportedArch arch, + const CPUIDRecords& cpuid_recs, + const std::vector& checkpoints); + +} // namespace rr \ No newline at end of file diff --git a/src/CreateCheckpointsCommand.cc b/src/CreateCheckpointsCommand.cc new file mode 100644 index 00000000000..25fc9461e7b --- /dev/null +++ b/src/CreateCheckpointsCommand.cc @@ -0,0 +1,183 @@ +#include "CheckpointInfo.h" +#include "Command.h" +#include "CreateCheckpointsCommand.h" +#include "GdbServerConnection.h" +#include "ReplayTimeline.h" +#include "TraceStream.h" +#include "log.h" +#include "main.h" +#include + +namespace rr { + +CreateCheckpointsCommand CreateCheckpointsCommand::singleton( + "create-checkpoints", + " rr create-checkpoints [OPTION]... []\n" + " -i, --interval= Create persistent checkpoints on an interval " + "of \n" + " events.\n" + " -s, --start= Start setting checkpoints at event \n" + " -e, --end= Stop setting checkpoints at event \n" + "\n" + "Creates a checkpoint at an interval of N events. " + "The command will attempt to\n" + "honor this interval as closely as possible.\n"); + +static bool parse_options(std::vector& args, + CreateCheckpointsFlags& options) { + if (parse_global_option(args)) { + return true; + } + static const OptionSpec op_spec[] = { { 'i', "--interval", HAS_PARAMETER }, + { 's', "--start", HAS_PARAMETER }, + { 'e', "--end", HAS_PARAMETER } }; + + ParsedOption opt; + if (!Command::parse_option(args, op_spec, &opt)) { + return false; + } + switch (opt.short_name) { + case 'i': + options.events_interval = static_cast(std::abs(opt.int_value)); + break; + case 's': + options.start_event = static_cast(std::abs(opt.int_value)); + break; + case 'e': + options.end_event = static_cast(std::abs(opt.int_value)); + break; + default: + DEBUG_ASSERT(0 && "Unknown option"); + return false; + } + return true; +} + +bool CreateCheckpointsCommand::verify_params_ok( + const CreateCheckpointsFlags& flags) { + if (flags.events_interval == 0) { + std::cout << "You need to provide an interval to set checkpoints at.\n"; + return false; + } + if (flags.end_event < flags.start_event) { + std::cout << "start & end has invalid values"; + return false; + } + if ((flags.end_event == UINT64_MAX && flags.start_event == 0) || + (flags.start_event != 0 && flags.end_event == UINT64_MAX)) { + return true; + } + + if ((flags.end_event - flags.start_event) < flags.events_interval) { + std::cout << "interval too large, can't fit between start & end"; + return false; + } + return true; +} + +int CreateCheckpointsCommand::run(std::vector& args) { + CreateCheckpointsFlags flags; + bool found_dir = false; + std::string trace_dir{}; + while (!args.empty()) { + if (parse_options(args, flags)) { + continue; + } + if (!found_dir && parse_optional_trace_dir(args, &trace_dir)) { + found_dir = true; + continue; + } + print_help(stderr); + return 1; + } + + if (!verify_params_ok(flags)) { + print_help(stderr); + return 1; + } + + auto verified_frames_to_checkpoint_at = + CreateCheckpointsCommand::find_events_to_checkpoint(trace_dir, flags); + if (verified_frames_to_checkpoint_at.empty()) { + std::cout << "No checkpointable events found.\n"; + return 2; + } + return run_main(trace_dir, verified_frames_to_checkpoint_at); +} + +int CreateCheckpointsCommand::run_main( + const std::string& trace_dir, + const std::vector& verified_events) { + DEBUG_ASSERT(!verified_events.empty() && + "No events provided to checkpoint at."); + ReplaySession::Flags session_flags{}; + ReplayTimeline timeline{ ReplaySession::create(trace_dir, session_flags) }; + std::vector cp_infos; + auto& reader = timeline.current_session().trace_reader(); + for (const auto evt : verified_events) { + RunCommand cmd = RUN_CONTINUE; + while (reader.time() < evt) { + auto r = timeline.replay_step_forward(cmd); + } + auto& session = timeline.current_session(); + if (session.trace_reader().time() == evt) { + ASSERT(session.current_task(), session.can_clone()) + << "could not clone at frame " << evt; + auto mark = timeline.add_explicit_checkpoint(); + CheckpointInfo cp_info{ ExtendedTaskId::from(*session.current_task()), mark }; + cp_infos.push_back(cp_info); + mark.get_checkpoint()->serialize_checkpoint(cp_info); + timeline.remove_explicit_checkpoint(mark); + LOG(debug) << "Serialized checkpoint at event " << evt; + } else { + FATAL() << "Stopped at wrong event"; + } + } + + update_persistent_checkpoint_index( + timeline.current_session().trace_reader().dir(), + timeline.current_session().arch(), + timeline.current_session().trace_reader().cpuid_records(), cp_infos); + std::cout << "Create checkpoints run successfully finished: " + << cp_infos.size() << " checkpoints created." << std::endl; + return 0; +} + +std::vector CreateCheckpointsCommand::find_events_to_checkpoint( + const std::string& trace_dir, const CreateCheckpointsFlags& flags) { + TraceReader reader{ trace_dir }; + std::vector events; + auto total = 0ul; + + if (flags.start_event != 0) { + while (!reader.at_end()) { + total++; + const auto f = reader.read_frame(); + if (f.time() >= static_cast(flags.start_event) && f.event().can_checkpoint_at()) { + LOG(debug) << "Checkpointable event: " << f.event() << " " << f.time(); + events.push_back(f.time()); + break; + } + } + if (reader.at_end()) { + std::cout << "Trace is shorter than " << flags.start_event + << " (total trace events: " << total << ")" + << "Aborting." << std::endl; + return {}; + } + } + const auto add = flags.start_event == 0 ? 1 : 0; + while (!reader.at_end() && total <= flags.end_event) { + const auto f = reader.read_frame(); + const auto next = + (events.size() + add) * flags.events_interval + flags.start_event; + if (f.time() >= static_cast(next) && f.event().can_checkpoint_at()) { + LOG(debug) << "Checkpointable event: " << f.event() << " " << f.time(); + events.push_back(f.time()); + } + total++; + } + return events; +} + +}; // namespace rr \ No newline at end of file diff --git a/src/CreateCheckpointsCommand.h b/src/CreateCheckpointsCommand.h new file mode 100644 index 00000000000..fa012c5fe70 --- /dev/null +++ b/src/CreateCheckpointsCommand.h @@ -0,0 +1,37 @@ +#pragma once + +#include "Command.h" +#include + +namespace rr { + +using FrameTime = int64_t; + +struct CreateCheckpointsFlags { + uint64_t events_interval = 0; + uint64_t start_event = 0; + uint64_t end_event = UINT64_MAX; +}; + +class CreateCheckpointsCommand : Command { +public: + virtual int run(std::vector& args) override; + + static CreateCheckpointsCommand* get() { return &singleton; } + +protected: + CreateCheckpointsCommand(const char* name, const char* help) : Command(name, help) {} + + static CreateCheckpointsCommand singleton; + +private: + /* Runs the actual replay, creating checkpoints at events `frames_to_checkpoint_at`. */ + int run_main(const std::string& trace_dir, const std::vector& frames_to_checkpoint_at); + + /* Returns events to checkpoint at given an `interval`. If `report_total` as + * an out parameter, will report total event count of trace. */ + static std::vector find_events_to_checkpoint(const std::string& trace_dir, const CreateCheckpointsFlags& interval); + bool verify_params_ok(const CreateCheckpointsFlags& cp); +}; + +} diff --git a/src/DebuggerExtensionCommand.cc b/src/DebuggerExtensionCommand.cc index ef2ecc552d3..c423c3d6c99 100644 --- a/src/DebuggerExtensionCommand.cc +++ b/src/DebuggerExtensionCommand.cc @@ -1,6 +1,7 @@ /* -*- Mode: C++; tab-width: 8; c-basic-offset: 2; indent-tabs-mode: nil; -*- */ #include "DebuggerExtensionCommand.h" +#include "CheckpointInfo.h" #include "ReplayTask.h" #include "log.h" @@ -76,12 +77,10 @@ static SimpleDebuggerExtensionCommand rr_history_push( forward_stack.clear(); return string(); }); + static SimpleDebuggerExtensionCommand back( "back", "Go back one entry in the rr history.", [](GdbServer& gdb_server, Task* t, const vector&) { - if (!gdb_server.timeline()) { - return string("Command requires a full debugging session."); - } if (!t->session().is_replaying()) { return DebuggerExtensionCommandHandler::cmd_end_diversion(); } @@ -97,9 +96,6 @@ static SimpleDebuggerExtensionCommand back( static SimpleDebuggerExtensionCommand forward( "forward", "Go forward one entry in the rr history.", [](GdbServer& gdb_server, Task* t, const vector&) { - if (!gdb_server.timeline()) { - return string("Command requires a full debugging session."); - } if (!t->session().is_replaying()) { return DebuggerExtensionCommandHandler::cmd_end_diversion(); } @@ -117,22 +113,17 @@ static int gNextCheckpointId = 0; string invoke_checkpoint(GdbServer& gdb_server, Task*, const vector& args) { - if (!gdb_server.timeline()) { - return string("Command requires a full debugging session."); - } const string& where = args[0]; if (gdb_server.in_debuggee_end_state) { return string("The program is not being run."); } + auto& timeline = *gdb_server.timeline(); int checkpoint_id = ++gNextCheckpointId; - GdbServer::Checkpoint::Explicit e; - if (gdb_server.timeline()->can_add_checkpoint()) { - e = GdbServer::Checkpoint::EXPLICIT; - } else { - e = GdbServer::Checkpoint::NOT_EXPLICIT; - } - gdb_server.checkpoints[checkpoint_id] = GdbServer::Checkpoint( - *gdb_server.timeline(), gdb_server.last_continue_task, e, where); + const Checkpoint::Explicit e = timeline.can_add_checkpoint() + ? Checkpoint::EXPLICIT + : Checkpoint::NOT_EXPLICIT; + gdb_server.checkpoints[checkpoint_id] = Checkpoint( + timeline, gdb_server.last_continue_task, e, where); return string("Checkpoint ") + to_string(checkpoint_id) + " at " + where; } static SimpleDebuggerExtensionCommand checkpoint( @@ -146,9 +137,7 @@ string invoke_delete_checkpoint(GdbServer& gdb_server, Task*, if (args.size() < 1) { return "'delete checkpoint' requires an argument"; } - if (!gdb_server.timeline()) { - return string("Command requires a full debugging session."); - } + char* endptr; long id = strtol(args[0].c_str(), &endptr, 10); if (*endptr) { @@ -156,7 +145,7 @@ string invoke_delete_checkpoint(GdbServer& gdb_server, Task*, } auto it = gdb_server.checkpoints.find(id); if (it != gdb_server.checkpoints.end()) { - if (it->second.is_explicit == GdbServer::Checkpoint::EXPLICIT) { + if (it->second.is_explicit == Checkpoint::EXPLICIT) { gdb_server.timeline()->remove_explicit_checkpoint(it->second.mark); } gdb_server.checkpoints.erase(it); @@ -178,7 +167,7 @@ string invoke_info_checkpoints(GdbServer& gdb_server, Task*, string out = "ID\tWhen\tWhere"; for (auto& c : gdb_server.checkpoints) { out += string("\n") + to_string(c.first) + "\t" + - to_string(c.second.mark.time()) + "\t" + c.second.where; + to_string(c.second.mark.get_key().trace_time) + "\t" + c.second.where; } return out; } @@ -187,6 +176,73 @@ static SimpleDebuggerExtensionCommand info_checkpoints( "list all checkpoints created with the 'checkpoint' command", invoke_info_checkpoints); +string invoke_load_checkpoint(GdbServer& server, Task*, const vector&) { + auto existing_checkpoints = server.current_session().as_replay()->get_persistent_checkpoints(); + auto cp_deserialized = 0; + for (const auto& cp : existing_checkpoints) { + if(server.persistent_checkpoint_is_loaded(cp.unique_id)) { + continue; + } + ScopedFd fd = cp.open_for_read(); + auto session = ReplaySession::create(server.current_session().as_replay()->trace_reader().dir(), server.timeline()->current_session().flags()); + int checkpoint_id = ++gNextCheckpointId; + session->load_checkpoint(cp); + + server.checkpoints[checkpoint_id] = Checkpoint(*server.timeline(), cp, session); + cp_deserialized++; + } + return "loaded " + std::to_string(cp_deserialized) + " checkpoints from disk"; +} + +static SimpleDebuggerExtensionCommand load_checkpoint( + "load-checkpoints", + "loads persistent checkpoints", + invoke_load_checkpoint); + +string invoke_write_checkpoints(GdbServer& server, Task* t, + const vector&) { + auto new_cps = 0; + const auto& trace_dir = t->session().as_replay()->trace_reader().dir(); + std::vector existing_checkpoints; + + for (auto& kvp : server.checkpoints) { + auto& cp = kvp.second; + if (!cp.persistent()) { + if (cp.mark.has_rr_checkpoint()) { + LOG(debug) << "Checkpoint has clone at " << cp.mark.get_key().trace_time; + existing_checkpoints.push_back(CheckpointInfo{cp}); + cp.mark.get_checkpoint()->serialize_checkpoint(existing_checkpoints.back()); + new_cps++; + } else { + auto mark_with_clone = server.timeline()->find_closest_mark_with_clone(cp.mark); + if (!mark_with_clone) { + std::cout + << "Could not find a session clone to serialize for checkpoint " + << kvp.first << '\n'; + } else { + LOG(debug) << "Current event for checkpoint " << cp.mark.get_key().trace_time + << "; closest clone found at event " + << mark_with_clone->get_key().trace_time; + existing_checkpoints.push_back(CheckpointInfo{cp, *mark_with_clone}); + mark_with_clone->get_checkpoint()->serialize_checkpoint(existing_checkpoints.back()); + new_cps++; + } + } + } else { + // checkpoint has already been serialized. + existing_checkpoints.emplace_back(cp); + } + } + + update_persistent_checkpoint_index(trace_dir, t->arch(), ((ReplayTask*)t)->trace_reader().cpuid_records(), existing_checkpoints); + return std::to_string(new_cps) + " new checkpoints serialized. (total: " + std::to_string(existing_checkpoints.size()) + ")"; +} + +static SimpleDebuggerExtensionCommand write_checkpoints( + "write-checkpoints", + "make checkpoints persist on disk.", + invoke_write_checkpoints); + void DebuggerExtensionCommand::init_auto_args() { static __attribute__((unused)) int dummy = []() { checkpoint.add_auto_arg("rr-where"); diff --git a/src/DebuggerExtensionCommand.h b/src/DebuggerExtensionCommand.h index 82308656666..d38dcc9e8f5 100644 --- a/src/DebuggerExtensionCommand.h +++ b/src/DebuggerExtensionCommand.h @@ -66,6 +66,9 @@ class SimpleDebuggerExtensionCommand : public DebuggerExtensionCommand { virtual std::string invoke(GdbServer& gdb_server, Task* t, const std::vector& args) override { + if (!gdb_server.timeline()) { + return "Command requires a full debugging session."; + } return invoker(gdb_server, t, args); } diff --git a/src/Event.cc b/src/Event.cc index a855533f7aa..a9c3be02f24 100644 --- a/src/Event.cc +++ b/src/Event.cc @@ -243,4 +243,24 @@ const char* state_name(SyscallState state) { } } +bool Event::can_checkpoint_at() const { + if (has_ticks_slop()) { + return false; + } + switch (type()) { + case EV_EXIT: + // At exits, we can't clone the exiting tasks, so + // don't event bother trying to checkpoint. + case EV_SYSCALLBUF_RESET: + // RESETs are usually inserted in between syscall + // entry/exit. Do not attempting to checkpoint at + // RESETs. Users would never want to do that anyway. + case EV_TRACE_TERMINATION: + // There's nothing to checkpoint at the end of a trace. + return false; + default: + return true; + } +} + } // namespace rr diff --git a/src/Event.h b/src/Event.h index ca4e0494c84..4f1a72ed38d 100644 --- a/src/Event.h +++ b/src/Event.h @@ -391,6 +391,12 @@ struct Event { /** Return a string naming |ev|'s type. */ std::string type_name() const; + /** + * Return true if it's possible/meaningful to make a checkpoint at the + * |frame| that |t| will replay. + */ + bool can_checkpoint_at() const; + static Event noop() { return Event(EV_NOOP); } static Event trace_termination() { return Event(EV_TRACE_TERMINATION); } static Event instruction_trap() { return Event(EV_INSTRUCTION_TRAP); } diff --git a/src/FdTable.cc b/src/FdTable.cc index c2d02a75a83..e276188a899 100644 --- a/src/FdTable.cc +++ b/src/FdTable.cc @@ -7,6 +7,19 @@ #include #include +#include "BpfMapMonitor.h" +#include "FileMonitor.h" +#include "MagicSaveDataMonitor.h" +#include "MmappedFileMonitor.h" +#include "NonvirtualPerfCounterMonitor.h" +#include "ODirectFileMonitor.h" +#include "PreserveFileMonitor.h" +#include "ProcFdDirMonitor.h" +#include "ProcMemMonitor.h" +#include "ProcStatMonitor.h" +#include "RRPageMonitor.h" +#include "StdioMonitor.h" +#include "SysCpuMonitor.h" #include "rr/rr.h" #include "AddressSpace.h" @@ -283,4 +296,86 @@ vector FdTable::fds_to_close_after_exec(RecordTask* t) { return fds_to_close; } +void FdTable::deserialize(Task* leader, + const pcp::ProcessSpace::Reader& leader_reader) { + auto monitors = leader_reader.getMonitors(); + for (auto m : monitors) { + FileMonitor::Type t = (FileMonitor::Type)m.getType(); + auto fd = m.getFd(); + if (!is_monitoring(m.getFd())) { + switch (t) { + case FileMonitor::Base: + FATAL() << "Can't add abstract type"; + break; + case FileMonitor::MagicSaveData: + add_monitor(leader, fd, new MagicSaveDataMonitor()); + break; + case FileMonitor::Mmapped: { + const auto mmap = m.getMmap(); + add_monitor(leader, fd, + new MmappedFileMonitor(mmap.getDead(), mmap.getDevice(), + mmap.getInode())); + } break; + case FileMonitor::Preserve: + add_monitor(leader, fd, new PreserveFileMonitor()); + break; + case FileMonitor::ProcFd: { + const auto p_fd = m.getProcFd(); + const auto tuid = TaskUid(p_fd.getTid(), p_fd.getSerial()); + add_monitor(leader, fd, new ProcFdDirMonitor(tuid)); + break; + } + case FileMonitor::ProcMem: { + const auto pmem = m.getProcMem(); + add_monitor(leader, fd, + new ProcMemMonitor(AddressSpaceUid(pmem.getTid(), + pmem.getSerial(), + pmem.getExecCount()))); + } break; + case FileMonitor::Stdio: + add_monitor(leader, fd, new StdioMonitor(m.getStdio())); + break; + case FileMonitor::VirtualPerfCounter: + FATAL() << "VirtualPerCounter Monitor deserializing unimplemented!\n"; + break; + case FileMonitor::NonvirtualPerfCounter: + add_monitor(leader, fd, new NonvirtualPerfCounterMonitor()); + break; + case FileMonitor::SysCpu: + add_monitor(leader, fd, new SysCpuMonitor(leader, "")); + break; + case FileMonitor::ProcStat: + add_monitor( + leader, fd, + new ProcStatMonitor(leader, data_to_str(m.getProcStat()))); + break; + case FileMonitor::RRPage: + add_monitor(leader, fd, new RRPageMonitor()); + break; + case FileMonitor::ODirect: + add_monitor(leader, fd, new ODirectFileMonitor()); + break; + case FileMonitor::BpfMap: + add_monitor(leader, fd, + new BpfMapMonitor(m.getBpf().getKeySize(), + m.getBpf().getValueSize())); + break; + default: + FATAL() << "unhandled FileMonitor: " << file_monitor_type_name(t); + } + } + } +} + +void FdTable::serialize(pcp::ProcessSpace::Builder& leader_builder) const { + auto serialized_fd_mons = leader_builder.initMonitors(fds.size()); + auto mon_index = 0; + for (const auto& mon : fds) { + const auto fd = mon.first; + const auto& monitor = mon.second; + auto builder = serialized_fd_mons[mon_index++]; + monitor->serialize(fd, builder); + } +} + } // namespace rr diff --git a/src/FdTable.h b/src/FdTable.h index 02b9f233cd8..4a95fb8a61c 100644 --- a/src/FdTable.h +++ b/src/FdTable.h @@ -9,6 +9,7 @@ #include "FileMonitor.h" #include "HasTaskSet.h" +#include "rr_pcp.capnp.h" namespace rr { @@ -74,6 +75,9 @@ class FdTable final : public HasTaskSet { int last_free_fd() const { return last_free_fd_; } void set_last_free_fd(int last_free_fd) { last_free_fd_ = last_free_fd; } + void serialize(pcp::ProcessSpace::Builder& leader_builder) const; + void deserialize(Task* leader, const pcp::ProcessSpace::Reader& leader_reader); + void insert_task(Task* t) override; void erase_task(Task* t) override; diff --git a/src/FileMonitor.cc b/src/FileMonitor.cc index 340d55b1a4c..0faa69e4338 100644 --- a/src/FileMonitor.cc +++ b/src/FileMonitor.cc @@ -114,4 +114,10 @@ std::string file_monitor_type_name(FileMonitor::Type t) { } } +void FileMonitor::serialize(int fd, pcp::FileMonitor::Builder& builder) const noexcept { + builder.setFd(fd); + builder.setType(type()); + serialize_type(builder); +} + } diff --git a/src/FileMonitor.h b/src/FileMonitor.h index a0ed3dfbdba..81789ebec86 100644 --- a/src/FileMonitor.h +++ b/src/FileMonitor.h @@ -13,6 +13,7 @@ class Task; #include "preload/preload_interface.h" #include "util.h" +#include "rr_pcp.capnp.h" namespace rr { @@ -43,7 +44,7 @@ class FileMonitor { PidFd, }; - virtual Type type() { return Base; } + virtual Type type() const { return Base; } /** * Overriding this to return true will cause close() (and related fd-smashing @@ -129,6 +130,13 @@ class FileMonitor { virtual enum syscallbuf_fd_classes get_syscallbuf_class() { return FD_CLASS_TRACED; } + + /** Serialize this file monitor for persistent checkpoints. */ + void serialize(int fd, pcp::FileMonitor::Builder& builder) const noexcept; + +private: + // default serialize_type does nothing + virtual void serialize_type(pcp::FileMonitor::Builder&) const noexcept {} }; std::string file_monitor_type_name(FileMonitor::Type t); diff --git a/src/GdbServer.cc b/src/GdbServer.cc index f13046da943..22532a06fea 100644 --- a/src/GdbServer.cc +++ b/src/GdbServer.cc @@ -21,6 +21,7 @@ #include "ElfReader.h" #include "Event.h" #include "DebuggerExtensionCommandHandler.h" +#include "GdbServerConnection.h" #include "GdbServerExpression.h" #include "ReplaySession.h" #include "ReplayTask.h" @@ -34,6 +35,8 @@ #include "log.h" #include "util.h" +#include "CheckpointInfo.h" + using namespace std; namespace rr { @@ -50,18 +53,14 @@ GdbServer::ConnectionFlags::ConnectionFlags() serve_files(false), debugger_params_write_pipe(nullptr) {} -static ExtendedTaskId extended_task_id(Task* t) { - return ExtendedTaskId(t->thread_group()->tguid(), t->tuid()); -} - GdbServer::GdbServer(std::unique_ptr& connection, Task* t, ReplayTimeline* timeline, const Target& target) : dbg(std::move(connection)), debuggee_tguid(t->thread_group()->tguid()), target(target), - last_continue_task(extended_task_id(t)), - last_query_task(extended_task_id(t)), + last_continue_task(ExtendedTaskId::from(*t)), + last_query_task(ExtendedTaskId::from(*t)), final_event(UINT32_MAX), in_debuggee_end_state(false), failed_restart(false), @@ -139,7 +138,7 @@ static bool matches_threadid(const ExtendedTaskId& tid, } static bool matches_threadid(Task* t, const GdbThreadId& target) { - return matches_threadid(extended_task_id(t), target); + return matches_threadid(ExtendedTaskId::from(*t), target); } static WatchType watchpoint_type(GdbRequestType req) { @@ -180,7 +179,7 @@ static void maybe_singlestep_for_event(Task* t, GdbRequest* req) { req->suppress_debugger_stop = true; req->cont().actions.push_back( GdbContAction(ACTION_STEP, - extended_task_id(t).to_debugger_thread_id())); + ExtendedTaskId::from(*t).to_debugger_thread_id())); } } @@ -304,7 +303,7 @@ static vector thread_info(const Session& sessio vector threads; for (auto& kv : session.tasks()) { threads.push_back({ - extended_task_id(kv.second), + ExtendedTaskId::from(*kv.second), kv.second->regs().ip().register_value() }); } @@ -338,7 +337,7 @@ void GdbServer::dispatch_debugger_request(Session& session, vector tids; if (state != REPORT_THREADS_DEAD && !failed_restart) { for (auto& kv : session.tasks()) { - tids.push_back(extended_task_id(kv.second)); + tids.push_back(ExtendedTaskId::from(*kv.second)); } } dbg->reply_get_thread_list(tids); @@ -470,9 +469,9 @@ void GdbServer::dispatch_debugger_request(Session& session, : session.find_task(is_query ? last_query_task.tuid : last_continue_task.tuid); if (target) { if (is_query) { - last_query_task = extended_task_id(target); + last_query_task = ExtendedTaskId::from(*target); } else { - last_continue_task = extended_task_id(target); + last_continue_task = ExtendedTaskId::from(*target); } } // These requests query or manipulate which task is the @@ -902,7 +901,7 @@ bool GdbServer::diverter_process_debugger_requests( Task* task = find_first_task_matching_target(diversion_session, actions); DEBUG_ASSERT(task != nullptr); - last_continue_task = extended_task_id(task); + last_continue_task = ExtendedTaskId::from(*task); } return diversion_refcount > 0; } @@ -940,7 +939,7 @@ bool GdbServer::diverter_process_debugger_requests( if (req->target.tid) { Task* next = diversion_session.find_task(req->target.tid); if (next) { - last_query_task = extended_task_id(next); + last_query_task = ExtendedTaskId::from(*next); } } break; @@ -1086,9 +1085,9 @@ void GdbServer::maybe_notify_stop(const Session& session, if (do_stop && t->thread_group()->tguid() == debuggee_tguid) { /* Notify the debugger and process any new requests * that might have triggered before resuming. */ - notify_stop_internal(session, extended_task_id(t), stop_siginfo.si_signo, + notify_stop_internal(session, ExtendedTaskId::from(*t), stop_siginfo.si_signo, watch); - last_query_task = last_continue_task = extended_task_id(t); + last_query_task = last_continue_task = ExtendedTaskId::from(*t); } } @@ -1445,7 +1444,7 @@ GdbServer::ContinueOrStop GdbServer::debug_one_step( if (t->thread_group()->tguid() == debuggee_tguid) { interrupt_pending = false; notify_stop_internal(timeline_->current_session(), - extended_task_id(t), in_debuggee_end_state ? SIGKILL : 0); + ExtendedTaskId::from(*t), in_debuggee_end_state ? SIGKILL : 0); memset(&stop_siginfo, 0, sizeof(stop_siginfo)); return CONTINUE_DEBUGGING; } @@ -1457,7 +1456,7 @@ GdbServer::ContinueOrStop GdbServer::debug_one_step( exit_sigkill_pending = false; if (req.cont().run_direction == RUN_FORWARD) { notify_stop_internal(timeline_->current_session(), - extended_task_id(t), SIGKILL); + ExtendedTaskId::from(*t), SIGKILL); memset(&stop_siginfo, 0, sizeof(stop_siginfo)); return CONTINUE_DEBUGGING; } @@ -1637,7 +1636,7 @@ void GdbServer::activate_debugger() { target.require_exec = false; target.event = completed_event; - last_query_task = last_continue_task = extended_task_id(t); + last_query_task = last_continue_task = ExtendedTaskId::from(*t); // Have the "checkpoint" be the original replay // session, and then switch over to using the cloned @@ -1755,8 +1754,29 @@ void GdbServer::restart_session(const GdbRequest& req) { if (checkpoint_to_restore.mark) { timeline_->seek_to_mark(checkpoint_to_restore.mark); - last_query_task = last_continue_task = - checkpoint_to_restore.last_continue_task; + const auto at_followed_process = [&](const auto& target) { + return timeline()->current_session().current_task()->tgid() == target.pid; + }; + if(at_followed_process(target)) { + // normal checkpoint restart branch, because checkpoint was created via GDB. + // last_continue_tuid is therefore serialized, so we can set it from that. + DEBUG_ASSERT(timeline()->current_session().current_task()->tuid() == checkpoint_to_restore.last_continue_task.tuid); + last_query_task = last_continue_task = checkpoint_to_restore.last_continue_task; + } else { + // Persistent checkpoints might have been created during another process' + // execution which GDB is not "following" thus, we need to tell + // ReplayTimeline to play until it reaches |Target.pid|. + while (!at_followed_process(target)) { + ReplayResult result = timeline()->replay_step_forward(RUN_CONTINUE); + // We should never reach the end of the trace without hitting the stop + // condition below. + DEBUG_ASSERT(result.status != REPLAY_EXITED); + } + auto t = timeline()->current_session().current_task(); + ASSERT(t, t != nullptr) << "Could not find current task at checkpoint restore"; + last_query_task = last_continue_task = ExtendedTaskId::from(*t); + } + if (debugger_restart_checkpoint.is_explicit == Checkpoint::EXPLICIT) { timeline_->remove_explicit_checkpoint(debugger_restart_checkpoint.mark); } @@ -2105,6 +2125,44 @@ static void remove_trailing_guard_pages(ReplaySession::MemoryRanges& ranges) { } } +bool GdbServer::persistent_checkpoint_is_loaded(size_t unique_id) { + for(const auto& cp : checkpoints) { + if(cp.second.unique_id == unique_id) return true; + } + return false; +} + +Checkpoint::Checkpoint(ReplayTimeline& timeline, ExtendedTaskId last_continue_task, + Explicit e, const std::string& where) + : last_continue_task(last_continue_task), is_explicit(e), where(where) { + if (e == EXPLICIT) { + mark = timeline.add_explicit_checkpoint(); + } else { + mark = timeline.mark(); + const auto prior = timeline.find_closest_mark_with_clone(mark); + if(prior) { + prior->get_internal()->inc_refcount(); + } + } +} + +// Used when deserializing persistent checkpoints +Checkpoint::Checkpoint(ReplayTimeline& timeline, const CheckpointInfo& cp, + ReplaySession::shr_ptr session) + : last_continue_task(cp.last_continue_task), + is_explicit(EXPLICIT), + where(cp.where), + unique_id(cp.unique_id) { + if (cp.non_explicit_mark_data) { + LOG(debug) << "checkpoint clone at " << cp.clone_data.time + << "; GDB checkpoint at " << cp.non_explicit_mark_data->time; + mark = timeline.recreate_marks_for_non_explicit(cp, session); + } else { + mark = timeline.recreate_mark_from_data(cp.clone_data, session); + timeline.register_mark_as_checkpoint(mark); + } +} + remote_ptr GdbServer::allocate_debugger_mem(ThreadGroupUid tguid, size_t size, int prot) { if (!timeline_) { diff --git a/src/GdbServer.h b/src/GdbServer.h index d639c2787b0..e065b68c7ae 100644 --- a/src/GdbServer.h +++ b/src/GdbServer.h @@ -22,6 +22,25 @@ namespace rr { +struct Checkpoint { + enum Explicit { EXPLICIT, NOT_EXPLICIT }; + Checkpoint(ReplayTimeline& timeline, ExtendedTaskId last_continue_task, Explicit e, + const std::string& where); + Checkpoint() : is_explicit(NOT_EXPLICIT) {} + // Used when creating deserialized checkpoints + Checkpoint(ReplayTimeline& timeline, const CheckpointInfo& cp, + ReplaySession::shr_ptr session); + + bool persistent() const { return unique_id != 0; } + + ReplayTimeline::Mark mark; + ExtendedTaskId last_continue_task; + Explicit is_explicit; + std::string where; + // Only persistent checkpoints have unique id's. + size_t unique_id = 0; +}; + class GdbServer { // Not ideal but we can't inherit friend from DebuggerExtensionCommand friend std::string invoke_checkpoint(GdbServer&, Task*, @@ -30,6 +49,12 @@ class GdbServer { const std::vector&); friend std::string invoke_info_checkpoints(GdbServer&, Task*, const std::vector&); + friend std::string invoke_write_checkpoints(GdbServer&, Task*, + const std::vector&); + friend std::string invoke_info_written_checkpoints(GdbServer&, Task*, + const std::vector&); + friend std::string invoke_load_checkpoint(GdbServer&, Task*, + const std::vector&); public: struct Target { @@ -178,6 +203,11 @@ class GdbServer { */ int open_file(Session& session, Task *continue_task, const std::string& file_name); + /** + * Check if persistent checkpoint with id `unique_id` has been loaded in this session. + */ + bool persistent_checkpoint_is_loaded(size_t unique_id); + /** * Allocates debugger-owned memory region. * We pretend this memory exists in all sessions, but it actually only @@ -257,23 +287,6 @@ class GdbServer { ReplayTimeline* timeline_; Session* emergency_debug_session; - struct Checkpoint { - enum Explicit { EXPLICIT, NOT_EXPLICIT }; - Checkpoint(ReplayTimeline& timeline, ExtendedTaskId last_continue_task, Explicit e, - const std::string& where) - : last_continue_task(last_continue_task), is_explicit(e), where(where) { - if (e == EXPLICIT) { - mark = timeline.add_explicit_checkpoint(); - } else { - mark = timeline.mark(); - } - } - Checkpoint() : is_explicit(NOT_EXPLICIT) {} - ReplayTimeline::Mark mark; - ExtendedTaskId last_continue_task; - Explicit is_explicit; - std::string where; - }; // |debugger_restart_mark| is the point where we will restart from with // a no-op debugger "run" command. Checkpoint debugger_restart_checkpoint; diff --git a/src/GdbServerConnection.cc b/src/GdbServerConnection.cc index ca26e932e09..41c1996cb1f 100644 --- a/src/GdbServerConnection.cc +++ b/src/GdbServerConnection.cc @@ -58,6 +58,11 @@ static bool request_needs_immediate_response(const GdbRequest* req) { } #endif +/*static*/ +ExtendedTaskId ExtendedTaskId::from(const Task& t) noexcept { + return ExtendedTaskId(t.thread_group()->tguid(), t.tuid()); +} + GdbServerConnection::GdbServerConnection(ThreadGroupUid tguid, const Features& features) : tguid(tguid), cpu_features_(0), diff --git a/src/GdbServerConnection.h b/src/GdbServerConnection.h index 94a05f9a1c9..78689ea4f5b 100644 --- a/src/GdbServerConnection.h +++ b/src/GdbServerConnection.h @@ -69,6 +69,8 @@ struct ExtendedTaskId { GdbThreadId to_debugger_thread_id() const { return GdbThreadId(tguid.tid(), tuid.tid()); } + + static ExtendedTaskId from(const Task& t) noexcept; }; inline std::ostream& operator<<(std::ostream& o, const ExtendedTaskId& t) { diff --git a/src/MagicSaveDataMonitor.h b/src/MagicSaveDataMonitor.h index 6ac2c1bf2c6..1fd99c37192 100644 --- a/src/MagicSaveDataMonitor.h +++ b/src/MagicSaveDataMonitor.h @@ -14,7 +14,7 @@ class MagicSaveDataMonitor : public FileMonitor { public: MagicSaveDataMonitor() {} - virtual Type type() override { return MagicSaveData; } + virtual Type type() const override { return MagicSaveData; } virtual void did_write(Task* t, const std::vector& ranges, LazyOffset& offset) override; diff --git a/src/MmappedFileMonitor.cc b/src/MmappedFileMonitor.cc index 5a28a4739b4..1cf0e6e5939 100644 --- a/src/MmappedFileMonitor.cc +++ b/src/MmappedFileMonitor.cc @@ -27,6 +27,8 @@ MmappedFileMonitor::MmappedFileMonitor(Task* t, EmuFile::shr_ptr f) { inode_ = f->inode(); } +MmappedFileMonitor::MmappedFileMonitor(bool dead, dev_t device, ino_t inode) noexcept : dead_(dead), device_(device), inode_(inode) {} + void MmappedFileMonitor::did_write(Task* t, const std::vector& ranges, LazyOffset& offset) { // If there are no remaining mappings that we care about, those can't reappear @@ -108,4 +110,11 @@ void MmappedFileMonitor::did_write(Task* t, const std::vector& ranges, } } +void MmappedFileMonitor::serialize_type(pcp::FileMonitor::Builder& builder) const noexcept { + auto mmap = builder.initMmap(); + mmap.setDead(dead_); + mmap.setDevice(device_); + mmap.setInode(inode_); +} + } // namespace rr diff --git a/src/MmappedFileMonitor.h b/src/MmappedFileMonitor.h index 3f19f004407..cfa50936c58 100644 --- a/src/MmappedFileMonitor.h +++ b/src/MmappedFileMonitor.h @@ -18,8 +18,9 @@ class MmappedFileMonitor : public FileMonitor { public: MmappedFileMonitor(Task* t, int fd); MmappedFileMonitor(Task* t, EmuFile::shr_ptr f); + MmappedFileMonitor(bool dead, dev_t device, ino_t inode) noexcept; - virtual Type type() override { return Mmapped; } + virtual Type type() const override { return Mmapped; } void revive() { dead_ = false; } // If this write could potentially affect memory we need to PREVENT_SWITCH, // since the timing of the write is otherwise unpredictable from our @@ -35,6 +36,7 @@ class MmappedFileMonitor : public FileMonitor { LazyOffset& offset) override; private: + void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override; // Whether this monitor is still actively monitoring bool dead_; dev_t device_; diff --git a/src/NonvirtualPerfCounterMonitor.h b/src/NonvirtualPerfCounterMonitor.h index 92282e4329d..a9fc5c75eac 100644 --- a/src/NonvirtualPerfCounterMonitor.h +++ b/src/NonvirtualPerfCounterMonitor.h @@ -15,7 +15,8 @@ class NonvirtualPerfCounterMonitor : public FileMonitor { public: NonvirtualPerfCounterMonitor() {} - virtual Type type() override { return NonvirtualPerfCounter; } + virtual Type type() const override { return NonvirtualPerfCounter; } + }; } // namespace rr diff --git a/src/ODirectFileMonitor.h b/src/ODirectFileMonitor.h index 2532e02f81e..be53d780070 100644 --- a/src/ODirectFileMonitor.h +++ b/src/ODirectFileMonitor.h @@ -17,7 +17,7 @@ class ODirectFileMonitor : public FileMonitor { public: ODirectFileMonitor() : FileMonitor() {}; - virtual Type type() override { return ODirect; } + virtual Type type() const override { return ODirect; } }; } // namespace rr diff --git a/src/PersistentCheckpointing.cc b/src/PersistentCheckpointing.cc new file mode 100644 index 00000000000..5085055b581 --- /dev/null +++ b/src/PersistentCheckpointing.cc @@ -0,0 +1,570 @@ +#include "AutoRemoteSyscalls.h" +#include "BpfMapMonitor.h" +#include "CheckpointInfo.h" +#include "EmuFs.h" +#include "FileMonitor.h" +#include "MagicSaveDataMonitor.h" +#include "MmappedFileMonitor.h" +#include "NonvirtualPerfCounterMonitor.h" +#include "ODirectFileMonitor.h" +#include "PersistentCheckpointing.h" +#include "PidFdMonitor.h" +#include "PreserveFileMonitor.h" +#include "ProcFdDirMonitor.h" +#include "ProcMemMonitor.h" +#include "ProcStatMonitor.h" +#include "RRPageMonitor.h" +#include "ReplayTask.h" +#include "ScopedFd.h" +#include "Session.h" +#include "StdioMonitor.h" +#include "SysCpuMonitor.h" +#include "Task.h" +#include "TaskishUid.h" +#include "TraceFrame.h" +#include "TraceStream.h" +#include "VirtualPerfCounterMonitor.h" +#include "log.h" +#include "replay_syscall.h" +#include "rr_pcp.capnp.h" +#include "util.h" +#include +#include +#include +#include +#include +#include + +namespace rr { + +#define PAGE_PRESENT(page_map_entry) page_map_entry&(1ul << 63) +#define PAGE_SWAPPED(page_map_entry) page_map_entry&(1ul << 62) +#define PAGE_FILE_OR_SHARED_ANON(page_map_entry) page_map_entry&(1ul << 61) +#define FILE_OP_FATAL(file) \ + FATAL() << "write_map failed for " << std::string{ file } << " " +constexpr auto PRIVATE_ANON = MAP_ANONYMOUS | MAP_PRIVATE; + +static std::string file_name_of(const std::string& path) { + auto pos = path.rfind("/"); + // means we're an "ok" filename (ok, means we have no path components - we're + // either empty or just a file name) + if (pos == std::string::npos) { + return path; + } + return path.substr(pos + 1); +} + +WriteVmConfig::WriteVmConfig(Task* clone_leader, const char* data_dir, + size_t buffer_size) noexcept + : clone_leader(clone_leader), cp_data_dir(data_dir) { + const auto procfs_mem = clone_leader->proc_mem_path(); + const auto procfs_pagemap = clone_leader->proc_pagemap_path(); + proc_mem_fd = ScopedFd{ procfs_mem.c_str(), O_RDONLY }; + ASSERT(clone_leader, proc_mem_fd.is_open()) + << "Serializing VM for " << clone_leader->rec_tid + << " failed. Couldn't open " << procfs_mem; + proc_pagemap_fd = ScopedFd{ procfs_pagemap.c_str(), O_RDONLY }; + ASSERT(clone_leader, proc_pagemap_fd.is_open()) + << "Serializing VM for " << clone_leader->rec_tid + << " failed. Couldn't open " << proc_pagemap_fd; + buffer = { .ptr = ::mmap(nullptr, buffer_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0), + .size = buffer_size }; + ASSERT(clone_leader, buffer.ptr != MAP_FAILED) + << "Failed to mmap buffer with capacity " << buffer_size; +} + +std::string checkpoints_index_file(const std::string& trace_dir) { + return trace_dir + "/checkpoints"; +} + + +static void write_map(const WriteVmConfig& cfg, + rr::pcp::KernelMapping::Builder builder, + const AddressSpace::Mapping& map) { + LOG(debug) << "serializing " << map.map.str(); + builder.setStart(map.map.start().as_int()); + builder.setEnd(map.map.end().as_int()); + builder.setFsname(str_to_data(map.recorded_map.fsname())); + builder.setDevice(map.map.device()); + builder.setInode(map.recorded_map.inode()); + builder.setProtection(map.map.prot()); + builder.setFlags(map.map.flags()); + // This will be interpreted as 0 on restore, since we create files for + // individual mappings. + builder.setOffset(map.map.file_offset_bytes()); + + std::vector pagemap_entries{}; + + const auto page_count = map.map.size() / page_size(); + pagemap_entries.resize(page_count); + + const auto read_idx_start = (map.map.start().as_int() / page_size()) * 8; + DEBUG_ASSERT(read_idx_start % 8 == 0); + + // walk the page map entries for mapping and determine on how we represent (or + // not represent) it's data in the capnproto file + auto entries_read_sz = ::pread(cfg.proc_pagemap_fd, pagemap_entries.data(), + page_count * sizeof(uint64_t), read_idx_start); + if (entries_read_sz == -1) + FATAL() << "Failed to read page map"; + auto pages_present = 0; + bool all_not_file_or_shared = true; + for (auto pme : pagemap_entries) { + if (PAGE_PRESENT(pme)) + pages_present++; + // probably don't have to check _all_ of the mappings for this, since we + // know the entire segment up front. + if (PAGE_FILE_OR_SHARED_ANON(pme)) + all_not_file_or_shared = false; + } + + // "guard segment": untouched, uninitialized memory, we don't write it's + // contents + if ((map.map.flags() & PRIVATE_ANON) == PRIVATE_ANON && pages_present == 0 && + map.map.prot() == PROT_NONE && all_not_file_or_shared) { + builder.initMapType().setGuardSegment(); + } else { + auto map_type = builder.initMapType(); + + const auto pid = cfg.clone_leader->tid; + const auto fname = file_name_of(map.map.fsname()); + // XXX when/if RR moves to c++20, use std::format. + const auto len = std::snprintf( + nullptr, 0, "%s/%d-%s-%p-%p", cfg.cp_data_dir, pid, fname.c_str(), + (void*)map.map.start().as_int(), (void*)map.map.end().as_int()); + char file[len + 1]; + if (map.map.fsname().empty()) { + std::snprintf(file, len, "%s/%d-%p-%p", cfg.cp_data_dir, pid, + (void*)map.map.start().as_int(), + (void*)map.map.end().as_int()); + } else { + std::snprintf(file, len, "%s/%d-%s-%p-%p", cfg.cp_data_dir, pid, + fname.c_str(), (void*)map.map.start().as_int(), + (void*)map.map.end().as_int()); + } + ScopedFd f{ file, O_EXCL | O_CREAT | O_RDWR, 0777 }; + if (!f.is_open()) + FILE_OP_FATAL(file) << "Couldn't open file"; + + const auto sz = ::ftruncate(f, map.map.size()); + if (sz == -1) + FILE_OP_FATAL(file) << "couldn't truncate file to size " + << map.map.size(); + + const auto bytes_read = ::pread(cfg.proc_mem_fd, cfg.buffer.ptr, + map.map.size(), map.map.start().as_int()); + if (bytes_read == -1) + FILE_OP_FATAL(file) << " couldn't read contents of " << map.map.str(); + ASSERT(cfg.clone_leader, + static_cast(bytes_read) == map.map.size()) + << " data read from /proc/" << cfg.clone_leader->tid + << "/mem did not match kernel mapping metadata" + << " read " << bytes_read << " expected: " << map.map.size() << " of " + << map.map.str(); + + const auto written_bytes = ::write(f, cfg.buffer.ptr, map.map.size()); + if (written_bytes == -1) + FILE_OP_FATAL(file) << " couldn't write contents of " << map.map.str(); + + const std::string data_fname{ file }; + const auto contents_path = str_to_data(data_fname); + if (map.flags == AddressSpace::Mapping::IS_RR_PAGE || + map.flags == AddressSpace::Mapping::IS_THREAD_LOCALS) { + map_type.initRrPage().setContentsPath(contents_path); + } else if (map.flags == AddressSpace::Mapping::IS_SYSCALLBUF) { + map_type.initSyscallBuffer().setContentsPath(contents_path); + } else if (map.emu_file) { + // XXX simon(optimization): we should not need to write to shared + // memory multiple times (once for each leader - just once?). + auto shared_anon = map_type.initSharedAnon(); + const auto isSysVSegment = + cfg.clone_leader->vm()->has_shm_at(map.map) || + cfg.clone_leader->vm()->has_shm_at(map.recorded_map); + shared_anon.setContentsPath(contents_path); + shared_anon.setIsSysVSegment(isSysVSegment); + } else { + if (map.map.fsname().empty() || map.map.is_stack() || map.map.is_heap()) { + map_type.initPrivateAnon().setContentsPath(contents_path); + } else { + map_type.initFile().setContentsPath(contents_path); + } + } + } +} + +void +write_vm(Task* clone_leader, rr::pcp::ProcessSpace::Builder builder, + const std::string& checkpoint_data_dir) { + LOG(debug) << "writing VM for " << clone_leader->rec_tid << " to " + << checkpoint_data_dir; + if (::mkdir(checkpoint_data_dir.c_str(), 0700) != 0) { + LOG(info) << " directory " << checkpoint_data_dir << " already exists."; + } + + std::vector mappings; + auto copy_buffer_size = 0ul; + // any stack mapping will do. It has to be mapped first, mimicking + // `process_execve` at restore + const AddressSpace::Mapping* stack_mapping = nullptr; + for (const auto& m : clone_leader->vm()->maps()) { + // linux has exclusive control over this mapping. + if (m.map.is_vsyscall()) { + continue; + } + if (m.recorded_map.is_stack() && stack_mapping == nullptr) { + stack_mapping = &m; + } else { + mappings.push_back(&m); + } + // largest mapping in the vm; use that as buffer size + copy_buffer_size = std::max(copy_buffer_size, m.map.size()); + } + + ASSERT(clone_leader, !mappings.empty()) << "No mappings found to serialize"; + copy_buffer_size = ceil_page_size(copy_buffer_size); + WriteVmConfig cfg{ clone_leader, checkpoint_data_dir.c_str(), + copy_buffer_size }; + + auto kernel_mappings = builder.initVirtualAddressSpace(mappings.size() + 1); + builder.setBreakpointFaultAddress(clone_leader->vm()->do_breakpoint_fault_addr().register_value()); + auto idx = 0; + // write the/a stack mapping first. We're mimicking process_execve, therefore + // we need a stack segment first + + write_map(cfg, kernel_mappings[idx++], *stack_mapping); + for (auto m : mappings) { + write_map(cfg, kernel_mappings[idx++], *m); + } +} + +// reads serialized map contents from |path|, mmaps a read buffer in the +// supervisor, then write its contents to mappping |km| in ReplayTask |t|. +void restore_map_contents(ReplayTask* t, const std::string& path, const KernelMapping& km) { + LOG(debug) << "restoring contents of " << km << " from " << path + << " for task " << t->rec_tid; + auto fd = ScopedFd(path.c_str(), O_RDONLY); + ASSERT(t, fd.is_open()) << "Failed to open mapping contents file for " + << km.str() << " at " << path; + + auto addr = ::mmap(nullptr, km.size(), PROT_READ, MAP_PRIVATE, fd, 0); + ASSERT(t, addr != MAP_FAILED) + << "Could not load mapping contents of " << km.str() << " from " << path; + + bool write_ok = true; + auto bytes_written = t->write_bytes_helper_no_notifications( + km.start(), km.size(), addr, &write_ok); + ASSERT(t, write_ok) << "Failed to restore contents of mapping from file for " + << km.str(); + ASSERT(t, static_cast(bytes_written) == km.size()) + << "Failed to restore contents of mapping from file. Wrote " + << bytes_written << "; expected " << km.size(); + if (::munmap(addr, km.size()) == -1) { + FATAL() << "munmap failed for temporary buffer"; + } +} + +void map_region_file(AutoRemoteSyscalls& remote, const KernelMapping& km, + const std::string& file_path) { + struct stat real_file; + std::string real_file_name; + LOG(debug) << "directly mmap'ing " << km.size() << " bytes of " << file_path + << " at offset " << HEX(km.file_offset_bytes()) << "(" << km.str() + << ")"; + remote.finish_direct_mmap(km.start(), km.size(), km.prot(), + ((km.flags() & ~MAP_GROWSDOWN) | MAP_PRIVATE), + file_path.c_str(), O_RDONLY, 0, real_file, + real_file_name); + remote.task()->vm()->map(remote.task(), km.start(), km.size(), km.prot(), + km.flags(), km.file_offset_bytes(), km.fsname(), + km.device(), km.inode(), nullptr, &km); +} + +void map_private_anonymous(AutoRemoteSyscalls& remote, + const KernelMapping& km) { + LOG(debug) << "map region no file: " << km.str(); + remote.infallible_mmap_syscall_if_alive( + km.start(), km.size(), km.prot(), + (km.flags() & ~MAP_GROWSDOWN) | MAP_FIXED | MAP_ANONYMOUS, -1, 0); + remote.task()->vm()->map(remote.task(), km.start(), km.size(), km.prot(), + km.flags(), km.file_offset_bytes(), km.fsname(), + km.device(), km.inode(), nullptr, &km); +} + +Task::CapturedState reconstitute_captured_state( + ReplaySession& s, pcp::CapturedState::Reader reader) { + Task::CapturedState res; + res.ticks = reader.getTicks(); + { + auto register_raw = reader.getRegs().getRaw(); + res.regs = Registers{ s.arch() }; + res.regs.set_from_trace(s.arch(), register_raw.begin(), + register_raw.size()); + } + { + auto raw = reader.getExtraRegs().getRaw(); + set_extra_regs_from_raw(s.arch(), s.trace_reader().cpuid_records(), raw, + res.extra_regs); + } + + res.prname = data_to_str(reader.getPrname()); + res.fdtable_identity = reader.getFdtableIdentity(); + res.syscallbuf_child = reader.getSyscallbufChild(); + res.syscallbuf_size = reader.getSyscallbufSize(); + res.num_syscallbuf_bytes = reader.getNumSyscallbufBytes(); + res.preload_globals = reader.getPreloadGlobals(); + res.scratch_ptr = reader.getScratchPtr(); + res.scratch_size = reader.getScratchSize(); + res.top_of_stack = reader.getTopOfStack(); + auto rs = reader.getRseqState(); + res.rseq_state = std::make_unique(remote_ptr(rs.getPtr()), + rs.getAbortPrefixSignature()); + res.cloned_file_data_offset = reader.getClonedFileDataOffset(); + memcpy(res.thread_locals, reader.getThreadLocals().asBytes().begin(), + PRELOAD_THREAD_LOCALS_SIZE); + + res.rec_tid = reader.getRecTid(); + res.own_namespace_rec_tid = reader.getOwnNamespaceRecTid(); + res.serial = reader.getSerial(); + res.tguid = ThreadGroupUid{ reader.getTguid().getTid(), + reader.getTguid().getSerial() }; + res.desched_fd_child = reader.getDeschedFdChild(); + res.cloned_file_data_fd_child = reader.getClonedFileDataFdChild(); + res.cloned_file_data_fname = data_to_str(reader.getClonedFileDataFname()); + res.wait_status = WaitStatus{ reader.getWaitStatus() }; + res.tls_register = reader.getTlsRegister(); + + res.thread_areas = {}; + for (const auto& ta : reader.getThreadAreas()) { + const X86Arch::user_desc item = *(X86Arch::user_desc*)ta.begin(); + res.thread_areas.push_back(item); + } + + return res; +} + +void init_scratch_memory(ReplayTask* t, const KernelMapping& km) { + + t->scratch_ptr = km.start(); + t->scratch_size = km.size(); + size_t sz = t->scratch_size; + + ASSERT(t, (km.prot() & (PROT_READ | PROT_WRITE)) == (PROT_READ | PROT_WRITE)); + ASSERT(t, (km.flags() & (MAP_PRIVATE | MAP_ANONYMOUS)) == + (MAP_PRIVATE | MAP_ANONYMOUS)); + + { + AutoRemoteSyscalls remote(t); + remote.infallible_mmap_syscall_if_alive(t->scratch_ptr, sz, km.prot(), + km.flags() | MAP_FIXED, -1, 0); + t->vm()->map(t, t->scratch_ptr, sz, km.prot(), km.flags(), 0, std::string(), + KernelMapping::NO_DEVICE, KernelMapping::NO_INODE, nullptr, + &km); + } +} + +void write_capture_state(pcp::CapturedState::Builder& sb, + const Task::CapturedState& state) { + sb.setTicks(state.ticks); + sb.initRegs().setRaw(regs_to_raw(state.regs)); + sb.initExtraRegs().setRaw(extra_regs_to_raw(state.extra_regs)); + sb.setPrname(str_to_data(state.prname)); + sb.setFdtableIdentity(state.fdtable_identity); + sb.setSyscallbufChild(state.syscallbuf_child.as_int()); + sb.setSyscallbufSize(state.syscallbuf_size); + sb.setNumSyscallbufBytes(state.num_syscallbuf_bytes); + sb.setPreloadGlobals(state.preload_globals.as_int()); + sb.setScratchPtr(state.scratch_ptr.as_int()); + sb.setScratchSize(state.scratch_size); + sb.setTopOfStack(state.top_of_stack.as_int()); + auto rseq = sb.initRseqState(); + if (state.rseq_state) { + rseq.setPtr(state.rseq_state->ptr.as_int()); + rseq.setAbortPrefixSignature(state.rseq_state->abort_prefix_signature); + } else { + rseq.setPtr(0); + rseq.setAbortPrefixSignature(0); + } + + sb.setClonedFileDataOffset(state.cloned_file_data_offset); + auto tl = kj::ArrayPtr( + reinterpret_cast(state.thread_locals), 104); + sb.setThreadLocals(tl); + sb.setRecTid(state.rec_tid); + sb.setOwnNamespaceRecTid(state.own_namespace_rec_tid); + sb.setSerial(state.serial); + auto tguid = sb.initTguid(); + tguid.setTid(state.tguid.tid()); + tguid.setSerial(state.tguid.serial()); + sb.setDeschedFdChild(state.desched_fd_child); + sb.setClonedFileDataFdChild(state.cloned_file_data_fd_child); + sb.setClonedFileDataFname(str_to_data(state.cloned_file_data_fname)); + sb.setWaitStatus(state.wait_status.get()); + sb.setTlsRegister(state.tls_register); + auto thread_areas = sb.initThreadAreas(state.thread_areas.size()); + auto i = 0; + for (const auto& ta : state.thread_areas) { + thread_areas[i++] = capnp::Data::Builder{ (std::uint8_t*)&ta, sizeof(ta) }; + } +} + +void update_persistent_checkpoint_index( + const std::string& trace_dir, SupportedArch arch, + const std::vector& cpuid_recs, + const std::vector& checkpoints) { + + auto checkpoints_index = trace_dir + "/checkpoints"; + auto cps = get_checkpoint_infos(trace_dir, arch, cpuid_recs); + + // remove checkpoints on disk, which are not represented in |checkpoints| + for (auto& cp : cps) { + auto find = std::find_if(checkpoints.cbegin(), checkpoints.cend(), + [&](auto& cp_) { return cp == cp_; }); + if (find == std::cend(checkpoints)) { + cp.delete_from_disk(); + } + } + + // remove old file + remove(checkpoints_index.c_str()); + + if (checkpoints.empty()) + return; + + // and write a new one + auto fd = + ScopedFd(checkpoints_index.c_str(), O_EXCL | O_CREAT | O_RDWR, 0700); + if (!fd.is_open()) + FATAL() << "failed to open file " << checkpoints_index; + + capnp::MallocMessageBuilder message; + auto cc = message.initRoot(); + auto list = cc.initCheckpoints(checkpoints.size()); + auto idx = 0; + for (const auto& cp : checkpoints) { + auto cp_entry = list[idx++]; + cp_entry.setCloneCompletionFile(str_to_data(cp.capnp_cp_file)); + cp_entry.setId(cp.unique_id); + auto tuid = cp_entry.initLastContinueTask(); + tuid.setGroupId(cp.last_continue_task.tguid.tid()); + tuid.setGroupSerial(cp.last_continue_task.tguid.serial()); + tuid.setTaskId(cp.last_continue_task.tuid.tid()); + tuid.setTaskSerial(cp.last_continue_task.tuid.serial()); + + cp_entry.setWhere(str_to_data(cp.where)); + cp_entry.setNextSerial(cp.next_serial); + auto stats = cp_entry.getStatistics(); + stats.setBytesWritten(cp.stats.bytes_written); + stats.setSyscallsPerformed(cp.stats.syscalls_performed); + stats.setTicksProcessed(cp.stats.ticks_processed); + + const auto mark_data_serializer = [](const MarkData& mark_data, + auto& builder) { + builder.setTime(mark_data.time); + builder.setStepKey(mark_data.step_key); + builder.setTicks(mark_data.ticks); + builder.initRegs().setRaw(regs_to_raw(mark_data.regs)); + auto ras = builder.initReturnAddresses(8); + for (auto i = 0; i < 8; i++) { + ras.set(i, mark_data.return_addresses.addresses[i].as_int()); + } + builder.initExtraRegs().setRaw(extra_regs_to_raw(mark_data.extra_regs)); + builder.setTicksAtEventStart(mark_data.ticks_at_event_start); + builder.setSinglestepToNextMarkNoSignal( + mark_data.singlestep_to_next_mark_no_signal); + }; + + if (cp.is_explicit()) { + auto explicit_builder = cp_entry.initExplicit(); + mark_data_serializer(cp.clone_data, explicit_builder); + } else { + auto non_explicit = cp_entry.initNonExplicit(); + // mark that holds _actual_ session clone + auto mark_with_clone = non_explicit.initCloneMark(); + mark_data_serializer(cp.clone_data, mark_with_clone); + // mark that only holds a mark. It gets very messy quickly, wrt to Marks, + // Clones, Checkpoints. + auto mark_with_gdb_checkpoint = non_explicit.initCheckpointMark(); + mark_data_serializer(*cp.non_explicit_mark_data, + mark_with_gdb_checkpoint); + } + } + capnp::writePackedMessageToFd(fd, message); +} + +void deserialize_fdtable( + Task* leader, const rr::pcp::ProcessSpace::Reader& clone_leader_reader) { + auto table = leader->fd_table(); + auto monitors = clone_leader_reader.getMonitors(); + for (auto m : monitors) { + FileMonitor::Type t = (FileMonitor::Type)m.getType(); + auto fd = m.getFd(); + if (!table->is_monitoring(m.getFd())) { + switch (t) { + case FileMonitor::Base: + FATAL() << "Can't add abstract type"; + break; + case FileMonitor::MagicSaveData: + table->add_monitor(leader, fd, new MagicSaveDataMonitor()); + break; + case FileMonitor::Mmapped: { + const auto mmap = m.getMmap(); + table->add_monitor(leader, fd, + new MmappedFileMonitor(mmap.getDead(), + mmap.getDevice(), + mmap.getInode())); + } break; + case FileMonitor::Preserve: + table->add_monitor(leader, fd, new PreserveFileMonitor()); + break; + case FileMonitor::ProcFd: { + const auto p_fd = m.getProcFd(); + const auto tuid = TaskUid(p_fd.getTid(), p_fd.getSerial()); + table->add_monitor(leader, fd, new ProcFdDirMonitor(tuid)); + break; + } + case FileMonitor::ProcMem: { + const auto pmem = m.getProcMem(); + table->add_monitor( + leader, fd, + new ProcMemMonitor(AddressSpaceUid( + pmem.getTid(), pmem.getSerial(), pmem.getExecCount()))); + break; + } + case FileMonitor::Stdio: + table->add_monitor(leader, fd, new StdioMonitor(m.getStdio())); + break; + case FileMonitor::VirtualPerfCounter: + FATAL() << "VirtualPerCounter Monitor deserializing unimplemented!\n"; + break; + case FileMonitor::NonvirtualPerfCounter: + table->add_monitor(leader, fd, new NonvirtualPerfCounterMonitor()); + break; + case FileMonitor::SysCpu: + table->add_monitor(leader, fd, new SysCpuMonitor(leader, "")); + break; + case FileMonitor::ProcStat: + table->add_monitor( + leader, fd, + new ProcStatMonitor(leader, data_to_str(m.getProcStat()))); + break; + case FileMonitor::RRPage: + table->add_monitor(leader, fd, new RRPageMonitor()); + break; + case FileMonitor::ODirect: + table->add_monitor(leader, fd, new ODirectFileMonitor()); + break; + case FileMonitor::BpfMap: + table->add_monitor(leader, fd, + new BpfMapMonitor(m.getBpf().getKeySize(), + m.getBpf().getValueSize())); + break; + case FileMonitor::PidFd: + FATAL() << "PidFd not supported to be serialized yet"; + break; + } + } + } +} + +} // namespace rr \ No newline at end of file diff --git a/src/PersistentCheckpointing.h b/src/PersistentCheckpointing.h new file mode 100644 index 00000000000..50065ab5176 --- /dev/null +++ b/src/PersistentCheckpointing.h @@ -0,0 +1,81 @@ +#pragma once + +#include "AddressSpace.h" +#include "CheckpointInfo.h" +#include "log.h" +#include "rr_pcp.capnp.h" +#include +#include +#include +#include +namespace rr { + +using FrameTime = int64_t; + +// Persistent checkpointing related utilities + +/** Passed from write_vm to each write_map call. Configures buffer for copying + * mappings into as well as opening relevant proc fs files */ +class WriteVmConfig { +public: + WriteVmConfig(Task* clone_leader, const char* data_dir, size_t buffer_size) noexcept; + ~WriteVmConfig() { ::munmap(buffer.ptr, buffer.size); } + + Task* clone_leader; + ScopedFd proc_mem_fd; + ScopedFd proc_pagemap_fd; + const char* cp_data_dir; + + struct { + void* ptr; + size_t size; + } buffer; +}; + +/* Writes capture `state` to state builder `sb`. */ +void write_capture_state(pcp::CapturedState::Builder& sb, + const Task::CapturedState& state); + +/** + * Writes the VM of |clone_leader| using the Capnproto |builder|. Checkpoint + * specific data, like the serialized segments are stored in + * |checkpoint_data_dir| + */ +void write_vm(Task* clone_leader, rr::pcp::ProcessSpace::Builder builder, const std::string& checkpoint_data_dir); + +/** + * Write file |monitor| information to capnproto |builder| + */ +void write_monitor(rr::pcp::FileMonitor::Builder& builder, int fd, + FileMonitor* monitor); + +/** + * Restores Task::CapturedState from capnproto data. + */ +Task::CapturedState reconstitute_captured_state( + ReplaySession& s, pcp::CapturedState::Reader reader); + +void map_private_anonymous(AutoRemoteSyscalls& remote, const KernelMapping& km); + +/** + * Restores contents of `km` by copying contents from a file at `path` into it. + */ +void restore_map_contents(ReplayTask* t, const std::string& path, const KernelMapping& km); + +/** + * Maps a file-backed (read only) segment in `remote.task()`. + */ +void map_region_file(AutoRemoteSyscalls& remote, const KernelMapping& km, + const std::string& file_path); + +// XXX re-factor this from `replay_syscall.cc` so that we don't duplicate code +// like this. It's identical, but without assertion. Need input from maintainers +// on where to put this. +void init_scratch_memory(ReplayTask* t, const KernelMapping& km); + +using CapturedMemory = + std::vector, std::vector>>; + +void deserialize_fdtable(Task* t, const rr::pcp::ProcessSpace::Reader& reader); + +} // namespace rr \ No newline at end of file diff --git a/src/PidFdMonitor.h b/src/PidFdMonitor.h index 20c2019220e..794dad55898 100644 --- a/src/PidFdMonitor.h +++ b/src/PidFdMonitor.h @@ -21,7 +21,7 @@ class PidFdMonitor : public FileMonitor { PidFdMonitor(TaskUid tuid) : tuid(tuid) {} - virtual Type type() override { return PidFd; } + virtual Type type() const override { return PidFd; } static PidFdMonitor* get(FdTable* fd_table, int fd); diff --git a/src/PreserveFileMonitor.h b/src/PreserveFileMonitor.h index f3b01da13ca..84bd16cd333 100644 --- a/src/PreserveFileMonitor.h +++ b/src/PreserveFileMonitor.h @@ -20,7 +20,7 @@ namespace rr { class PreserveFileMonitor : public FileMonitor { public: PreserveFileMonitor() {} - virtual Type type() override { return Preserve; } + virtual Type type() const override { return Preserve; } virtual bool is_rr_fd() override { return true; } }; diff --git a/src/ProcFdDirMonitor.cc b/src/ProcFdDirMonitor.cc index 02b847203ec..7946e498730 100644 --- a/src/ProcFdDirMonitor.cc +++ b/src/ProcFdDirMonitor.cc @@ -26,6 +26,8 @@ ProcFdDirMonitor::ProcFdDirMonitor(Task* t, const string& pathname) { } } +ProcFdDirMonitor::ProcFdDirMonitor(TaskUid tuid) noexcept : tuid(tuid) {} + // returns the number of valid dirent structs left in the buffer template static int filter_dirent_structs(RecordTask* t, uint8_t* buf, size_t size) { @@ -124,4 +126,10 @@ void ProcFdDirMonitor::filter_getdents(RecordTask* t) { filter_dirents(t); } +void ProcFdDirMonitor::serialize_type(pcp::FileMonitor::Builder &builder) const noexcept { + auto pfd = builder.initProcFd(); + pfd.setTid(tuid.tid()); + pfd.setSerial(tuid.serial()); +} + } // namespace rr diff --git a/src/ProcFdDirMonitor.h b/src/ProcFdDirMonitor.h index 0757094c95d..a603d79f565 100644 --- a/src/ProcFdDirMonitor.h +++ b/src/ProcFdDirMonitor.h @@ -15,12 +15,14 @@ namespace rr { class ProcFdDirMonitor : public FileMonitor { public: ProcFdDirMonitor(Task* t, const std::string& pathname); + ProcFdDirMonitor(TaskUid tuid) noexcept; - virtual Type type() override { return ProcFd; } + virtual Type type() const override { return ProcFd; } virtual void filter_getdents(RecordTask* t) override; private: + void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override; // 0 if this doesn't object doesn't refer to a tracee's proc-mem. TaskUid tuid; }; diff --git a/src/ProcMemMonitor.cc b/src/ProcMemMonitor.cc index 27727e6c6a6..16d9347cdd1 100644 --- a/src/ProcMemMonitor.cc +++ b/src/ProcMemMonitor.cc @@ -8,6 +8,7 @@ #include "RecordSession.h" #include "ReplaySession.h" #include "ReplayTask.h" +#include "TaskishUid.h" #include "log.h" using namespace std; @@ -26,6 +27,8 @@ ProcMemMonitor::ProcMemMonitor(Task* t, const string& pathname) { } } +ProcMemMonitor::ProcMemMonitor(AddressSpaceUid auid) noexcept : auid(auid) {} + void ProcMemMonitor::did_write(Task* t, const std::vector& ranges, LazyOffset& lazy_offset) { if (ranges.empty()) { @@ -68,4 +71,11 @@ bool ProcMemMonitor::target_is_vm(AddressSpace *vm) { return auid == vm->uid(); } +void ProcMemMonitor::serialize_type(pcp::FileMonitor::Builder& builder) const noexcept { + auto pm = builder.initProcMem(); + pm.setExecCount(auid.exec_count()); + pm.setTid(auid.tid()); + pm.setSerial(auid.serial()); +} + } // namespace rr diff --git a/src/ProcMemMonitor.h b/src/ProcMemMonitor.h index e17b91571d9..66952a4f64d 100644 --- a/src/ProcMemMonitor.h +++ b/src/ProcMemMonitor.h @@ -15,8 +15,9 @@ namespace rr { class ProcMemMonitor : public FileMonitor { public: ProcMemMonitor(Task* t, const std::string& pathname); + ProcMemMonitor(AddressSpaceUid auid) noexcept; - virtual Type type() override { return ProcMem; } + virtual Type type() const override { return ProcMem; } // We need to PREVENT_SWITCH, since the timing of the write is otherwise // unpredictable from our perspective. @@ -32,6 +33,7 @@ class ProcMemMonitor : public FileMonitor { bool target_is_vm(AddressSpace *t); private: + void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override; // 0 if this doesn't object doesn't refer to a tracee's proc-mem. AddressSpaceUid auid; }; diff --git a/src/ProcStatMonitor.cc b/src/ProcStatMonitor.cc index 3aa4093fec7..f3fc3144824 100644 --- a/src/ProcStatMonitor.cc +++ b/src/ProcStatMonitor.cc @@ -61,4 +61,8 @@ bool ProcStatMonitor::emulate_read( return true; } +void ProcStatMonitor::serialize_type(pcp::FileMonitor::Builder &builder) const noexcept { + builder.setProcStat(str_to_data(data)); +} + } // namespace rr diff --git a/src/ProcStatMonitor.h b/src/ProcStatMonitor.h index 1ae97a0270e..8219d25fa12 100644 --- a/src/ProcStatMonitor.h +++ b/src/ProcStatMonitor.h @@ -18,11 +18,13 @@ class ProcStatMonitor : public FileMonitor { public: ProcStatMonitor(Task* t, const std::string& pathname); - virtual Type type() override { return ProcStat; } + virtual Type type() const override { return ProcStat; } bool emulate_read(RecordTask* t, const std::vector& ranges, LazyOffset&, uint64_t* result) override; + private: + void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override; std::string data; }; diff --git a/src/RRPageMonitor.h b/src/RRPageMonitor.h index d2f7fc98502..74dc1008376 100644 --- a/src/RRPageMonitor.h +++ b/src/RRPageMonitor.h @@ -17,7 +17,7 @@ class RRPageMonitor : public FileMonitor { public: RRPageMonitor() : FileMonitor() {}; - virtual Type type() override { return RRPage; } + virtual Type type() const override { return RRPage; } }; static_assert(TraceReader::SpecialLibRRpage != 0, diff --git a/src/ReplayCommand.cc b/src/ReplayCommand.cc index d3acf00dfdf..cf2e539882c 100644 --- a/src/ReplayCommand.cc +++ b/src/ReplayCommand.cc @@ -7,20 +7,24 @@ #include #include +#include #include +#include "CheckpointInfo.h" #include "Command.h" #include "Flags.h" #include "GdbServer.h" #include "ReplaySession.h" #include "ScopedFd.h" #include "WaitManager.h" +#include "TraceTaskEvent.h" #include "core.h" #include "kernel_metadata.h" #include "launch_debugger.h" #include "log.h" #include "main.h" + using namespace std; namespace rr { @@ -79,7 +83,8 @@ ReplayCommand ReplayCommand::singleton( " --serve-files Serve all files from the trace rather than\n" " assuming they exist on disk. Debugging will\n" " be slower, but be able to tolerate missing files\n" - " --tty Redirect tracee replay output to \n"); + " --tty Redirect tracee replay output to \n" + " --ignore-pcp Don't spawn session from persistent checkpoint\n"); struct ReplayFlags { // Start a debug server for the task scheduled at the first @@ -87,6 +92,9 @@ struct ReplayFlags { // been "created". FrameTime goto_event; + // Ignore persistent checkpoints and start session from beginning + bool ignore_persistent_cp; + FrameTime singlestep_to_event; pid_t target_process; @@ -146,6 +154,7 @@ struct ReplayFlags { ReplayFlags() : goto_event(0), + ignore_persistent_cp(false), singlestep_to_event(0), target_process(0), process_created_how(CREATED_NONE), @@ -189,7 +198,8 @@ static bool parse_replay_arg(vector& args, ReplayFlags& flags) { { 3, "serve-files", NO_PARAMETER }, { 4, "tty", HAS_PARAMETER }, { 5, "intel-pt-start-checking-event", HAS_PARAMETER }, - { 6, "retry-transient-errors", NO_PARAMETER } + { 6, "retry-transient-errors", NO_PARAMETER }, + { 7, "ignore-pcp", NO_PARAMETER }, }; ParsedOption opt; if (!Command::parse_option(args, options, &opt)) { @@ -295,12 +305,31 @@ static bool parse_replay_arg(vector& args, ReplayFlags& flags) { case 6: flags.retry_transient_errors = true; break; + case 7: + flags.ignore_persistent_cp = true; + break; + break; default: DEBUG_ASSERT(0 && "Unknown option"); } return true; } +/** + * Return event time when `pid` is created or first found in trace. + */ +static FrameTime process_spawn_time(const string& trace_dir, pid_t pid) { + TraceReader trace(trace_dir); + FrameTime time = -1; + for(auto e = trace.read_task_event(&time); e.type() != TraceTaskEvent::NONE; e = trace.read_task_event(&time)) { + if((e.type() == TraceTaskEvent::CLONE || e.type() == TraceTaskEvent::EXEC) && e.tid() == pid) { + LOG(debug) << "Process " << pid << " created at " << time; + return time; + } + } + return -1; +} + static int find_pid_for_command(const string& trace_dir, const string& command) { TraceReader trace(trace_dir); @@ -503,6 +532,16 @@ static int replay(const string& trace_dir, const ReplayFlags& flags) { serve_replay_no_debugger(trace_dir, flags); } else { auto session = ReplaySession::create(trace_dir, session_flags(flags, false)); + if (!flags.ignore_persistent_cp) { + const auto cps = session->get_persistent_checkpoints(); + const auto cp = find_if(rbegin(cps), rend(cps), [&](const auto& cp) { + return target.event >= cp.event_time(); + }); + if (cp != rend(cps)) { + LOG(debug) << "Spawning from persistent checkpoint at " << cp->event_time(); + session->load_checkpoint(*cp); + } + } GdbServer::ConnectionFlags conn_flags; conn_flags.dbg_port = flags.dbg_port; conn_flags.dbg_host = flags.dbg_host; @@ -531,6 +570,19 @@ static int replay(const string& trace_dir, const ReplayFlags& flags) { ScopedFd debugger_params_write_pipe(debugger_params_pipe[1]); auto session = ReplaySession::create(trace_dir, session_flags(flags, false)); + if (!flags.ignore_persistent_cp) { + const auto event_when_created = process_spawn_time(trace_dir, flags.target_process); + const auto pcps = session->get_persistent_checkpoints(); + const auto cp = find_if(rbegin(pcps), rend(pcps), [&](const auto& cp) { + return target.event >= cp.event_time() || event_when_created > cp.event_time(); + }); + + if (cp != rend(pcps)) { + LOG(debug) << "Spawning session from persistent checkpoint at " << cp->event_time(); + session->load_checkpoint(*cp); + } + } + GdbServer::ConnectionFlags conn_flags; conn_flags.dbg_port = flags.dbg_port; conn_flags.dbg_host = flags.dbg_host; diff --git a/src/ReplaySession.cc b/src/ReplaySession.cc index 40da2d5d009..5e164a7d800 100644 --- a/src/ReplaySession.cc +++ b/src/ReplaySession.cc @@ -27,6 +27,10 @@ #include "replay_syscall.h" #include "util.h" +#include "PersistentCheckpointing.h" +#include "PreserveFileMonitor.h" +#include + using namespace std; namespace rr { @@ -309,35 +313,11 @@ ReplaySession::shr_ptr ReplaySession::clone() { return session; } -/** - * Return true if it's possible/meaningful to make a checkpoint at the - * |frame| that |t| will replay. - */ -static bool can_checkpoint_at(const Event& ev) { - if (ev.has_ticks_slop()) { - return false; - } - switch (ev.type()) { - case EV_EXIT: - // At exits, we can't clone the exiting tasks, so - // don't event bother trying to checkpoint. - case EV_SYSCALLBUF_RESET: - // RESETs are usually inserted in between syscall - // entry/exit. Do not attempting to checkpoint at - // RESETs. Users would never want to do that anyway. - case EV_TRACE_TERMINATION: - // There's nothing to checkpoint at the end of a trace. - return false; - default: - return true; - } -} - bool ReplaySession::can_clone() { finish_initializing(); ReplayTask* t = current_task(); - return t && done_initial_exec() && can_checkpoint_at(current_trace_frame().event()); + return t && done_initial_exec() && current_trace_frame().event().can_checkpoint_at(); } DiversionSession::shr_ptr ReplaySession::clone_diversion() { @@ -2220,4 +2200,337 @@ bool ReplaySession::echo_stdio() const { current_frame_time() >= suppress_stdio_before_event_; } +void ReplaySession::serialize_checkpoint(CheckpointInfo& cp_info) { + DEBUG_ASSERT(clone_completion != nullptr); + capnp::MallocMessageBuilder message; + ScopedFd fd = cp_info.open_for_write(); + + auto cc = message.initRoot(); + + auto addr_space_count = clone_completion->address_spaces.size(); + auto& as_data = clone_completion->address_spaces; + auto addr_space_builders = cc.initAddressSpaces(addr_space_count); + + for (auto i = 0u; i < addr_space_count; i++) { + const auto& as = as_data[i]; + const auto leader = static_cast(as.clone_leader); + + auto addr_space_clone = addr_space_builders[i]; + addr_space_clone.setAuxv(kj::ArrayPtr{ + leader->vm()->saved_auxv().data(), leader->vm()->saved_auxv().size() }); + auto cls = addr_space_clone.initCloneLeaderState(); + write_capture_state(cls, as.clone_leader_state); + auto pspace = addr_space_builders[i].initProcessSpace(); + pspace.setTaskFirstRunEvent(leader->tg->first_run_event()); + pspace.setVmFirstRunEvent(leader->vm()->first_run_event()); + pspace.setExe(str_to_data(leader->vm()->exe_image())); + const auto orig_exe = leader->original_exe(); + pspace.setOriginalExe(str_to_data(orig_exe)); + + write_vm(as.clone_leader, pspace, cp_info.data_directory()); + auto captured_mem_list = + addr_space_clone.initCapturedMemory(as.captured_memory.size()); + auto captured_idx = 0; + for (const auto& mem : as.captured_memory) { + auto cm = captured_mem_list[captured_idx++]; + cm.setStartAddress(mem.first.as_int()); + cm.setData(kj::ArrayPtr(mem.second.data(), + mem.second.size())); + } + + auto member_states = + addr_space_clone.initMemberState(as.member_states.size()); + auto cs_idx = 0; + for (const auto& state : as.member_states) { + auto ms = member_states[cs_idx++]; + write_capture_state(ms, state); + } + clone_completion->cloned_fd_tables[as.clone_leader_state.fdtable_identity]->serialize(pspace); + cc.setUsesSyscallBuffering(leader->vm()->syscallbuf_enabled()); + } + + auto step = + capnp::Data::Reader{ (std::uint8_t*)¤t_step, sizeof(ReplayTraceStep) }; + cc.setSessionCurrentStep(step); + + auto siginfo = + capnp::Data::Reader{ (std::uint8_t*)&last_siginfo_, sizeof(siginfo_t) }; + cc.setLastSigInfo(siginfo); + capnp::writePackedMessageToFd(fd.get(), message); +} + +void ReplaySession::load_checkpoint( + const CheckpointInfo& cp_info) { + DEBUG_ASSERT(!partially_initialized()); + ScopedFd checkpoint_fd = cp_info.open_for_read(); + + capnp::PackedFdMessageReader datum(checkpoint_fd); + pcp::CloneCompletionInfo::Reader cc_reader = + datum.getRoot(); + const auto addr_spaces = cc_reader.getAddressSpaces(); + + std::vector partial_init_addr_spaces{}; + Task::ClonedFdTables cloned_fd_tables{}; + + std::vector cloned_leaders{}; + auto zygote = current_task(); + for (const auto& as : addr_spaces) { + const auto taskInfo = as.getCloneLeaderState(); + AutoRemoteSyscalls remote(zygote, + AutoRemoteSyscalls::DISABLE_MEMORY_PARAMS); + Task* child = Task::os_clone(Task::SESSION_CLONE_LEADER, this, remote, + taskInfo.getRecTid(), taskInfo.getSerial(), + SIGCHLD, nullptr); + cloned_leaders.push_back(static_cast(child)); + } + + auto clone_leader_index = 0; + LOG(debug) << "Restoring " << addr_spaces.size() << " clone leaders"; + for (const auto& as : addr_spaces) { + ReplayTask* leader = cloned_leaders[clone_leader_index++]; + const auto proc_space = as.getProcessSpace(); + const auto cleader_captured_state = as.getCloneLeaderState(); + + leader->is_stopped_ = true; + leader->os_exec_stub(arch()); + std::string exe_name = data_to_str(proc_space.getExe()); + std::string original_exe_name = data_to_str(proc_space.getOriginalExe()); + leader->post_exec(original_exe_name); + static_cast(leader)->post_exec_syscall(original_exe_name); + + // set up the/a stack mapping in which we can make remote syscalls in + // afterwards + auto mappings_data = proc_space.getVirtualAddressSpace(); + auto mappings_it = mappings_data.begin(); + + // Map an executable mapping first, so we can use that memory for remote sys + // calls + { + AutoRemoteSyscalls remote(leader, AutoRemoteSyscalls::DISABLE_MEMORY_PARAMS); + leader->vm()->unmap_all_but_rr_mappings(remote); + DEBUG_ASSERT(mappings_it->getMapType().isPrivateAnon() && (mappings_it->getProtection() & (PROT_READ | PROT_WRITE)) == (PROT_READ|PROT_WRITE)); + KernelMapping stack_mapping{ mappings_it->getStart(), + mappings_it->getEnd(), + data_to_str(mappings_it->getFsname()), + mappings_it->getDevice(), + mappings_it->getInode(), + mappings_it->getProtection(), + mappings_it->getFlags(), + static_cast( + mappings_it->getOffset()) }; + map_private_anonymous(remote, stack_mapping); + restore_map_contents(leader, data_to_str(mappings_it->getMapType().getPrivateAnon().getContentsPath()), + stack_mapping); + mappings_it++; + } + + auto scratchPointer = remote_ptr(cleader_captured_state.getScratchPtr()); + ASSERT(leader, scratchPointer != nullptr) << "No scratch pointer found!"; + if (proc_space.getBreakpointFaultAddress() != 0) { + leader->vm()->set_breakpoint_fault_addr( + proc_space.getBreakpointFaultAddress()); + } + + leader->thread_group()->set_first_run_event( + proc_space.getTaskFirstRunEvent()); + leader->vm()->set_first_run_event(proc_space.getVmFirstRunEvent()); + + std::vector> syscallbuf_mappings; + std::unique_ptr> scratch_mem = + nullptr; + { + AutoRemoteSyscalls remote(leader); + for (; mappings_it != std::end(mappings_data); mappings_it++) { + const auto& km_data = *mappings_it; + auto map = km_data.getMapType(); + KernelMapping km( + remote_ptr(km_data.getStart()), km_data.getEnd(), + km_data.hasFsname() ? data_to_str(km_data.getFsname()) : "", + km_data.getDevice(), km_data.getInode(), km_data.getProtection(), + km_data.getFlags(), km_data.getOffset()); + if (km.contains(scratchPointer)) { + scratch_mem = std::make_unique>( + std::make_pair( + km, data_to_str(map.getPrivateAnon().getContentsPath()))); + } else if (map.isGuardSegment()) { + // Guard segments: empty private anon mappings, where no data has been + // written. + map_private_anonymous(remote, km); + } else if (map.isFile()) { + auto p = data_to_str(map.getFile().getContentsPath()); + map_region_file(remote, km, p); + } else if (map.isSharedAnon()) { + auto sa = map.getSharedAnon(); + auto emufile = leader->session().emufs().get_or_create(km); + struct stat real_file; + std::string real_file_name; + remote.finish_direct_mmap( + km.start(), km.size(), km.prot(), + (km.flags() | MAP_FIXED) & ~MAP_ANONYMOUS, emufile->proc_path(), + O_RDWR, km.file_offset_bytes(), real_file, real_file_name); + leader->vm()->map(leader, km.start(), km.size(), km.prot(), + km.flags(), km.file_offset_bytes(), real_file_name, + real_file.st_dev, real_file.st_ino, nullptr, &km, + emufile); + restore_map_contents(leader, data_to_str(sa.getContentsPath()), km); + if (sa.getIsSysVSegment()) { + leader->vm()->set_shm_size(km.start(), km.size()); + } + } else if (map.isPrivateAnon()) { + auto f = map.getPrivateAnon(); + auto path = data_to_str(f.getContentsPath()); + map_private_anonymous(remote, km); + restore_map_contents(leader, path, km); + } else if (map.isRrPage()) { + const auto path = data_to_str(map.getRrPage().getContentsPath()); + restore_map_contents(leader, path, km); + } else if (map.isSyscallBuffer()) { + const auto path = + data_to_str(map.getSyscallBuffer().getContentsPath()); + syscallbuf_mappings.push_back(std::make_pair(km, path)); + } else { + FATAL() << "Unknown serialized map type"; + } + } + + auto index = original_exe_name.rfind('/'); + auto name = "rr:" + original_exe_name.substr( + index == std::string::npos ? 0 : index + 1); + leader->set_name(remote, name); + } + + ASSERT(leader, scratch_mem != nullptr) + << "Scratch memory mapping could not be restored."; + { + auto& km = scratch_mem->first; + auto& path = scratch_mem->second; + init_scratch_memory(leader, km); + restore_map_contents(leader, path, km); + } + + std::vector auxv{}; + auto auxv_data = as.getAuxv().asChars(); + std::copy(auxv_data.begin(), auxv_data.end(), std::back_inserter(auxv)); + + leader->vm()->restore_auxv(leader, std::move(auxv)); + syscall(SYS_rrcall_reload_auxv, leader->tid); + std::vector member_states; + + for (const auto& member_state : as.getMemberState()) { + member_states.push_back(reconstitute_captured_state(*this, member_state)); + } + + CapturedMemory captured_memory; + for (const auto& captured_mem : as.getCapturedMemory()) { + std::vector mem; + auto mem_reader = captured_mem.getData(); + std::copy(mem_reader.begin(), mem_reader.end(), std::back_inserter(mem)); + captured_memory.push_back( + std::make_pair(captured_mem.getStartAddress(), std::move(mem))); + } + + Task::CapturedState cloneLeaderCaptureState = + reconstitute_captured_state(*this, as.getCloneLeaderState()); + auto fd_table_key = cloneLeaderCaptureState.fdtable_identity; + leader->preload_globals = cloneLeaderCaptureState.preload_globals; + partial_init_addr_spaces.push_back(CloneCompletion::AddressSpaceClone{ + .clone_leader = leader, + .clone_leader_state = std::move(cloneLeaderCaptureState), + .member_states = std::move(member_states), + .captured_memory = std::move(captured_memory) }); + on_create(leader); + deserialize_fdtable(leader, proc_space); + + if (cc_reader.getUsesSyscallBuffering()) { + leader->vm()->set_uses_syscall_buffer(); + for (const auto& sysbuf : syscallbuf_mappings) { + const auto& km = sysbuf.first; + const auto& path = sysbuf.second; + AutoRemoteSyscalls remote(leader); + if (km.contains(cleader_captured_state.getSyscallbufChild())) { + const auto map_hint = km.start(); + leader->syscallbuf_size = cleader_captured_state.getSyscallbufSize(); + leader->init_syscall_buffer(remote, map_hint); + leader->desched_fd_child = cleader_captured_state.getDeschedFdChild(); + if (!leader->fd_table()->get_monitor(leader->desched_fd_child)) { + leader->fd_table()->add_monitor(leader, leader->desched_fd_child, + new PreserveFileMonitor()); + } + if (cleader_captured_state.getClonedFileDataFdChild() >= 0) { + leader->cloned_file_data_fd_child = + cleader_captured_state.getClonedFileDataFdChild(); + leader->cloned_file_data_fname = + trace_reader().file_data_clone_file_name(leader->tuid()); + ScopedFd clone_file(leader->cloned_file_data_fname.c_str(), + O_RDONLY); + ASSERT(leader, clone_file.is_open()); + remote.infallible_send_fd_dup( + clone_file, leader->cloned_file_data_fd_child, O_CLOEXEC); + leader->fd_table()->replace_monitor( + leader, leader->cloned_file_data_fd_child, + new PreserveFileMonitor()); + } + for (const auto& mem : + partial_init_addr_spaces.back().captured_memory) { + if (km.contains(mem.first)) { + leader->write_bytes_helper(mem.first, mem.second.size(), + mem.second.data()); + } + } + restore_map_contents(leader, path, km); + } else { + // recreate shared map, i.e. some _other_ task's (A) syscall buffer + // for this task (B), the mappings that just "float" due to being + // inherited after a fork, but from what I understood, isn't ever + // actually used. It's just "there". To keep the process' address + // space identical with normal execution, it is therefore mapped here. + char name[4096]; + strncpy(name, km.fsname().c_str(), sizeof(name) - 1); + name[sizeof(name) - 1] = 0; + create_shared_mmap(remote, km.size(), km.start(), + extract_name(name, sizeof(name)), km.prot(), 0, + nullptr); + remote.task()->vm()->mapping_flags_of(km.start()) |= + AddressSpace::Mapping::IS_SYSCALLBUF; + restore_map_contents(leader, path, km); + } + } + ASSERT(leader, leader->vm()->syscallbuf_enabled()) + << "syscall buffering should be enabled at this point"; + // Fool Task::copy_state that syscall buf has not been initialized. For + // pcp we need to since we never hit the events where syscall buffers get + // initialized like a normal executed tracee-replay would. + leader->syscallbuf_child = nullptr; + } + leader->ticks = cleader_captured_state.getTicks(); + + cloned_fd_tables[fd_table_key] = leader->fd_table(); + } // end of 1 clone leader setup iteration + + clone_completion = std::make_unique(); + clone_completion->address_spaces = std::move(partial_init_addr_spaces); + clone_completion->cloned_fd_tables = std::move(cloned_fd_tables); + + memcpy(¤t_step, cc_reader.getSessionCurrentStep().begin(), + sizeof(ReplayTraceStep)); + + trace_reader().rewind(); + trace_reader().forward_to(cp_info.clone_data.time); + + trace_frame = trace_reader().read_frame(); + memcpy(&last_siginfo_, cc_reader.getLastSigInfo().begin(), sizeof(siginfo_t)); + restore_session_info(cp_info); +} + +std::vector ReplaySession::get_persistent_checkpoints() { + return rr::get_checkpoint_infos(resolve_trace_name(trace_reader().dir()), + arch(), trace_reader().cpuid_records()); +} + +void ReplaySession::restore_session_info(const CheckpointInfo& cp) { + ticks_at_start_of_event = cp.clone_data.ticks_at_event_start; + next_task_serial_ = cp.next_serial; + statistics_ = cp.stats; +} + } // namespace rr diff --git a/src/ReplaySession.h b/src/ReplaySession.h index 74433739883..f3b5590d207 100644 --- a/src/ReplaySession.h +++ b/src/ReplaySession.h @@ -21,6 +21,7 @@ struct syscallbuf_hdr; namespace rr { class ReplayTask; +class CheckpointInfo; /** * ReplayFlushBufferedSyscallState is saved in Session and cloned with its @@ -185,6 +186,8 @@ class ReplaySession final : public Session { */ shr_ptr clone(); + bool partially_initialized() const { return clone_completion != nullptr; } + /** * Return true if we're in a state where it's OK to clone. For example, * we can't clone in some syscalls. @@ -376,6 +379,26 @@ class ReplaySession final : public Session { bool mark_stdio() const override; bool echo_stdio() const; + /** + * Serializes this session to disk and associates it with the + * checkpoint represented in |cp_info|, which represents the time in key, + * ticks, etc found in the `Mark` data type. Responsibility is on the caller + * that these actually belong together. + */ + void serialize_checkpoint(CheckpointInfo& cp_info); + + /** + * Deserializes into `this` session the session data found described by + * CheckpointInfo `cp`, restoring the process from disk. + */ + void load_checkpoint(const CheckpointInfo& cp); + + /** + * Returns persistent checkpoints stored in this trace. Returned + * CheckpointInfo list is sorted, ordered by event. + */ + std::vector get_persistent_checkpoints(); + private: ReplaySession(const std::string& dir, const Flags& flags); ReplaySession(const ReplaySession& other); @@ -420,6 +443,9 @@ class ReplaySession final : public Session { void clear_syscall_bp(); + // load `ReplaySession` session state from serialized checkpoint + void restore_session_info(const CheckpointInfo&); + std::shared_ptr emu_fs; std::shared_ptr tracee_output_fd_; TraceReader trace_in; diff --git a/src/ReplayTask.cc b/src/ReplayTask.cc index b6fb569f105..a01bef91f14 100644 --- a/src/ReplayTask.cc +++ b/src/ReplayTask.cc @@ -232,4 +232,27 @@ bool ReplayTask::post_vm_clone(CloneReason reason, int flags, Task* origin) { return false; } +std::string ReplayTask::original_exe() const { + TraceReader task_original_exe_reader = trace_reader(); + task_original_exe_reader.rewind(); + auto tid = rec_tid; + for (;;) { + auto tte = task_original_exe_reader.read_task_event(); + if (tte.type() == TraceTaskEvent::NONE) { + FATAL() + << "Could not find process of origin to grab original exe name from"; + } + if (tte.tid() == tid) { + if (tte.type() == TraceTaskEvent::CLONE) { + tid = tte.parent_tid(); + task_original_exe_reader.rewind(); + } else if (tte.type() == TraceTaskEvent::EXEC) { + return tte.file_name(); + } + } + } + FATAL() << "Could not find process of origin to grab original exe name from"; + return ""; +} + } // namespace rr diff --git a/src/ReplayTask.h b/src/ReplayTask.h index b127a36be47..8a3c1316db4 100644 --- a/src/ReplayTask.h +++ b/src/ReplayTask.h @@ -85,6 +85,9 @@ class ReplayTask final : public Task { seen_sched_in_syscallbuf_syscall_hook = true; } + /* Digs out the original executable image from the trace. */ + std::string original_exe() const; + std::string name() const override { return name_; } diff --git a/src/ReplayTimeline.cc b/src/ReplayTimeline.cc index 108cd9691d2..889dc86cda7 100644 --- a/src/ReplayTimeline.cc +++ b/src/ReplayTimeline.cc @@ -4,7 +4,11 @@ #include #include +#include +#include +#include "CheckpointInfo.h" +#include "EmuFs.h" #include "core.h" #include "fast_forward.h" #include "log.h" @@ -218,6 +222,23 @@ ReplayTimeline::Mark ReplayTimeline::mark() { return result; } +ReplayTimeline::Mark ReplayTimeline::recreate_mark_from_data(const MarkData& mark_data, ReplaySession::shr_ptr session) { + Mark result; + auto m = make_shared(this, mark_data, session); + m->inc_refcount(); + auto& mark_vector = marks[m->proto.key]; + mark_vector.push_back(m); + result.ptr = mark_vector.back(); + return result; +} + +void ReplayTimeline::register_mark_as_checkpoint(Mark& m) { + DEBUG_ASSERT(m.ptr && m.ptr->checkpoint && "Can't register mark as checkpoint if no checkpoint exists"); + auto key = m.ptr->proto.key; + increase_mark_with_checkpoints(key); + m.ptr->inc_refcount(); +} + void ReplayTimeline::mark_after_singlestep(const Mark& from, const ReplayResult& result) { DEBUG_ASSERT(result.break_status.singlestep_complete); @@ -288,6 +309,15 @@ ReplayTimeline::Mark ReplayTimeline::lazy_reverse_singlestep(const Mark& from, return Mark(); } +void ReplayTimeline::increase_mark_with_checkpoints( + const MarkKey& key) noexcept { + if (marks_with_checkpoints.find(key) == marks_with_checkpoints.end()) { + marks_with_checkpoints[key] = 1; + } else { + marks_with_checkpoints[key]++; + } +} + ReplayTimeline::Mark ReplayTimeline::add_explicit_checkpoint() { DEBUG_ASSERT(current->can_clone()); @@ -296,13 +326,9 @@ ReplayTimeline::Mark ReplayTimeline::add_explicit_checkpoint() { unapply_breakpoints_and_watchpoints(); m.ptr->checkpoint = current->clone(); auto key = m.ptr->proto.key; - if (marks_with_checkpoints.find(key) == marks_with_checkpoints.end()) { - marks_with_checkpoints[key] = 1; - } else { - marks_with_checkpoints[key]++; - } + increase_mark_with_checkpoints(key); } - ++m.ptr->checkpoint_refcount; + m.ptr->inc_refcount(); return m; } @@ -314,8 +340,7 @@ void ReplayTimeline::remove_mark_with_checkpoint(const MarkKey& key) { } void ReplayTimeline::remove_explicit_checkpoint(const Mark& mark) { - DEBUG_ASSERT(mark.ptr->checkpoint_refcount > 0); - if (--mark.ptr->checkpoint_refcount == 0) { + if (mark.ptr->dec_refcount() == 0) { mark.ptr->checkpoint = nullptr; remove_mark_with_checkpoint(mark.ptr->proto.key); } @@ -769,6 +794,17 @@ void ReplayTimeline::apply_breakpoints_and_watchpoints() { } } +ReplayTimeline::Mark ReplayTimeline::recreate_marks_for_non_explicit(const CheckpointInfo& cp, std::shared_ptr clone) { + DEBUG_ASSERT(cp.non_explicit_mark_data && clone.get() != nullptr); + // first add the mark with an actual clone, this is not a GDB checkpoint, but an RR checkpoint + auto mark = recreate_mark_from_data(cp.clone_data, clone); + reverse_exec_checkpoints[mark] = estimate_progress_of(*clone); + register_mark_as_checkpoint(mark); + // then add the mark with no clone, the one that will be visible to GDB, i.e. non explicit checkpoint + Mark result = recreate_mark_from_data(*cp.non_explicit_mark_data, nullptr); + return result; +} + void ReplayTimeline::unapply_breakpoints_internal() { for (auto& bp : breakpoints) { AddressSpace* vm = current->find_address_space(get<0>(bp)); @@ -1493,7 +1529,12 @@ ReplayResult ReplayTimeline::reverse_singlestep( } ReplayTimeline::Progress ReplayTimeline::estimate_progress() { - Session::Statistics stats = current->statistics(); + return estimate_progress_of(*current); +} + +/* static */ +ReplayTimeline::Progress ReplayTimeline::estimate_progress_of(ReplaySession& session) { + Session::Statistics stats = session.statistics(); // The following parameters were estimated by running Firefox startup // and shutdown in an opt build on a Lenovo W530 laptop, replaying with // DUMP_STATS_PERIOD set to 100 (twice, and using only values from the @@ -1668,4 +1709,50 @@ ReplayTimeline::Mark ReplayTimeline::set_short_checkpoint() { return reverse_exec_short_checkpoint; } +ReplayTimeline::ProtoMark ReplayTimeline::ProtoMark::from_serialized_markdata(const MarkData& md) { + auto proto_mark = ProtoMark{ MarkKey{ md.time, md.ticks, ReplayStepKey{ (ReplayTraceStepType)md.step_key } } }; + proto_mark.regs = md.regs; + proto_mark.return_addresses = md.return_addresses; + return proto_mark; +} + +std::shared_ptr ReplayTimeline::find_closest_mark_with_clone(const Mark &mark) { + if(marks_with_checkpoints.empty()) return nullptr; + + const MarkKey* k = &marks_with_checkpoints.begin()->first; + for(const auto& kvp : marks_with_checkpoints) { + if(kvp.first < mark.ptr->proto.key && kvp.first > *k) { + k = &kvp.first; + } + } + + auto marks_found = std::find_if(std::cbegin(marks), std::cend(marks), [&](auto& kvp){ + return kvp.first == *k; + }); + + if(marks_found == std::end(marks)) return nullptr; + + for(auto it = std::rbegin(marks_found->second); it != std::rend(marks_found->second); it++) { + DEBUG_ASSERT(it->get() != nullptr); + if((*it)->checkpoint) { + auto result = std::make_shared(); + result->ptr = *(it); + return result; + } + } + return nullptr; +} + +ReplayTimeline::InternalMark::InternalMark(ReplayTimeline* owner, + const MarkData& serialized, + ReplaySession::shr_ptr session) + : owner(owner), + proto(ProtoMark::from_serialized_markdata(serialized)), + extra_regs(serialized.extra_regs), + checkpoint(session), + ticks_at_event_start(serialized.ticks_at_event_start), + singlestep_to_next_mark_no_signal( + serialized.singlestep_to_next_mark_no_signal), + checkpoint_refcount(0) {} + } // namespace rr diff --git a/src/ReplayTimeline.h b/src/ReplayTimeline.h index f3961d7d7e5..1ee9a7de97f 100644 --- a/src/ReplayTimeline.h +++ b/src/ReplayTimeline.h @@ -3,6 +3,7 @@ #ifndef RR_REPLAY_TIMELINE_H_ #define RR_REPLAY_TIMELINE_H_ +#include #include #include #include @@ -18,6 +19,9 @@ namespace rr { +class CheckpointInfo; +struct MarkData; + enum RunDirection { RUN_FORWARD, RUN_BACKWARD }; /** @@ -26,9 +30,8 @@ enum RunDirection { RUN_FORWARD, RUN_BACKWARD }; * checkpoints along this timeline and navigating to specific events. */ class ReplayTimeline { -private: struct InternalMark; - + struct MarkKey; public: ReplayTimeline(std::shared_ptr session); ~ReplayTimeline(); @@ -69,6 +72,26 @@ class ReplayTimeline { const Registers& regs() const { return ptr->proto.regs; } const ExtraRegisters& extra_regs() const { return ptr->extra_regs; } + const MarkKey& get_key() const { + DEBUG_ASSERT(ptr != nullptr && "Mark has no data"); + return ptr->proto.key; + } + + // XXX refactor and possibly move Mark and it's internal hierarchy + // into it's own file, making them public, or something like that. + std::shared_ptr get_internal() const { + if(!ptr) FATAL() << "Mark has no data"; + return ptr; + } + + bool has_rr_checkpoint() const { return ptr != nullptr && ptr->checkpoint != nullptr; } + + ReplaySession::shr_ptr get_checkpoint() const { + DEBUG_ASSERT(ptr && "Mark has no data"); + DEBUG_ASSERT(ptr->checkpoint && "Mark has no checkpoint"); + return ptr->checkpoint; + } + FrameTime time() const { return ptr->proto.key.trace_time; } private: @@ -121,6 +144,9 @@ class ReplayTimeline { */ void remove_explicit_checkpoint(const Mark& mark); + /** Find mark that has `key` and increase the checkpoint count for that mark. */ + void increase_mark_with_checkpoints(const MarkKey& key) noexcept; + /** * Return true if we're currently at the given mark. */ @@ -256,6 +282,30 @@ class ReplayTimeline { */ void apply_breakpoints_and_watchpoints(); + /** + * Creates the two marks associated with a non-explicit GDB checkpoint. The returned `Mark` is the mark + * that the non-explicit checkpoint references, i.e. the one without an actual checkpoint / session. + */ + Mark recreate_marks_for_non_explicit(const CheckpointInfo& cp, std::shared_ptr clone); + + /** + * Re-create a mark from serialized MarkData `cp` and associate that mark with `session` which can be null + * in the case of for instance non-explicit checkpoints. + */ + Mark recreate_mark_from_data(const MarkData& cp, ReplaySession::shr_ptr session); + + /* + * Registers a free-formed Mark with this ReplayTimeline. Used when deserializing + * checkpoints. + */ + void register_mark_as_checkpoint(Mark& m); + + /** + * Find a session clone before `mark`. + * Returns the mark associated with that clone or nullptr if not found. + */ + std::shared_ptr find_closest_mark_with_clone(const Mark& mark); + private: /** * A MarkKey consists of FrameTime + Ticks + ReplayStepKey. These values @@ -320,6 +370,7 @@ class ReplayTimeline { MarkKey key; Registers regs; ReturnAddressList return_addresses; + static ProtoMark from_serialized_markdata(const MarkData& md); }; /** @@ -328,13 +379,16 @@ class ReplayTimeline { * of two Marks. */ struct InternalMark { + // Construct InternalMark from serialized mark data, for `owner` timeline with deserialized `session` + InternalMark(ReplayTimeline* owner, const MarkData& serialized, ReplaySession::shr_ptr session); + InternalMark(ReplayTimeline* owner, ReplaySession& session, const MarkKey& key) : owner(owner), proto(key), ticks_at_event_start(session.ticks_at_start_of_current_event()), - checkpoint_refcount(0), - singlestep_to_next_mark_no_signal(false) { + singlestep_to_next_mark_no_signal(false), + checkpoint_refcount(0) { ReplayTask* t = session.current_task(); if (t) { proto = ProtoMark(key, t); @@ -355,12 +409,24 @@ class ReplayTimeline { // Optional checkpoint for this Mark. ReplaySession::shr_ptr checkpoint; Ticks ticks_at_event_start; - // Number of users of `checkpoint`. - uint32_t checkpoint_refcount; // The next InternalMark in the ReplayTimeline's Mark vector is the result // of singlestepping from this mark *and* no signal is reported in the // break_status when doing such a singlestep. bool singlestep_to_next_mark_no_signal; + + // Increment refcount and return incremented value + [[maybe_unused]] uint32_t inc_refcount() noexcept { + return ++checkpoint_refcount; + } + + // Decrement refcount and return decremented value + [[maybe_unused]] uint32_t dec_refcount() noexcept { + DEBUG_ASSERT(checkpoint_refcount > 0); + return --checkpoint_refcount; + } + private: + // Number of users of `checkpoint`. + uint32_t checkpoint_refcount; }; friend struct InternalMark; friend std::ostream& operator<<(std::ostream& s, const InternalMark& o); @@ -430,6 +496,8 @@ class ReplayTimeline { Progress estimate_progress(); + static Progress estimate_progress_of(ReplaySession& session); + /** * Called when the current session has moved forward to a new execution * point and we might want to make a checkpoint to support reverse-execution. diff --git a/src/RerunCommand.cc b/src/RerunCommand.cc index 7bc751118a9..4ef43ef2a1e 100644 --- a/src/RerunCommand.cc +++ b/src/RerunCommand.cc @@ -4,8 +4,10 @@ #include #include +#include #include +#include "CheckpointInfo.h" #include "Command.h" #include "ExportImportCheckpoints.h" #include "Flags.h" @@ -55,7 +57,9 @@ RerunCommand RerunCommand::singleton( " another rr instance exporting checkpoints at\n" " \n" " -r, --raw dump registers in raw format\n" - " -s, --trace-start= start tracing at \n" + " -s, --trace-start= start tracing at . If a persistent checkpoint exists\n" + " before the session will spawn from that point.\n" + " --ignore-pcp Ignore persistent checkpoints when running command.\n" " -u, --cpu-unbound allow replay to run on any CPU. Default is\n" " to run on the CPU stored in the trace.\n" " Note that this may diverge from the recording\n" @@ -107,6 +111,7 @@ struct RerunFlags { int export_checkpoints_count; bool raw; bool cpu_unbound; + bool ignore_pcp; RerunFlags() : trace_start(0), @@ -114,7 +119,8 @@ struct RerunFlags { export_checkpoints_event(0), export_checkpoints_count(0), raw(false), - cpu_unbound(false) {} + cpu_unbound(false), + ignore_pcp(false) {} }; #ifdef __x86_64__ @@ -491,11 +497,12 @@ static bool parse_rerun_arg(vector& args, RerunFlags& flags) { { 2, "event-regs", HAS_PARAMETER }, { 3, "export-checkpoints", HAS_PARAMETER }, { 4, "import-checkpoint", HAS_PARAMETER }, + { 5, "ignore-pcp", NO_PARAMETER }, { 'e', "trace-end", HAS_PARAMETER }, { 'f', "function", HAS_PARAMETER }, { 'r', "raw", NO_PARAMETER }, { 's', "trace-start", HAS_PARAMETER }, - { 'u', "cpu-unbound", NO_PARAMETER } + { 'u', "cpu-unbound", NO_PARAMETER }, }; ParsedOption opt; if (!Command::parse_option(args, options, &opt)) { @@ -521,6 +528,9 @@ static bool parse_rerun_arg(vector& args, RerunFlags& flags) { case 4: flags.import_checkpoint_socket = opt.value; break; + case 5: + flags.ignore_pcp = true; + break; case 'e': if (!opt.verify_valid_int(1, UINT32_MAX)) { return false; @@ -548,6 +558,7 @@ static bool parse_rerun_arg(vector& args, RerunFlags& flags) { case 'u': flags.cpu_unbound = true; break; + break; default: DEBUG_ASSERT(0 && "Unknown option"); } @@ -649,6 +660,18 @@ static int rerun(const string& trace_dir, const RerunFlags& flags, CommandForChe // Now that we've spawned the replay, raise our resource limits if // possible. raise_resource_limits(); + if(!flags.ignore_pcp) { + const auto pcps = replay_session->get_persistent_checkpoints(); + const auto cp = find_if(rbegin(pcps), rend(pcps), [&](const auto& cp) { + return flags.trace_start >= cp.event_time(); + }); + + if (cp != rend(pcps)) { + LOG(info) << "Spawning from persistent checkpoint at time " + << cp->event_time(); + replay_session->load_checkpoint(*cp); + } + } } else { vector fds; if (export_checkpoints_socket.is_open()) { diff --git a/src/Session.cc b/src/Session.cc index 6c7a845d47e..13cb44e104e 100644 --- a/src/Session.cc +++ b/src/Session.cc @@ -30,17 +30,6 @@ using namespace std; namespace rr { -struct Session::CloneCompletion { - struct AddressSpaceClone { - Task* clone_leader; - Task::CapturedState clone_leader_state; - vector member_states; - vector, vector>> captured_memory; - }; - vector address_spaces; - Task::ClonedFdTables cloned_fd_tables; -}; - Session::Session() : tracee_socket(make_shared()), tracee_socket_receiver(make_shared()), @@ -519,38 +508,6 @@ KernelMapping Session::create_shared_mmap( return km; } -static char* extract_name(char* name_buffer, size_t buffer_size) { - // Recover the name that was originally chosen by finding the part of the - // name between rr_mapping_prefix and the -%d-%d at the end. - char* path_start = strstr(name_buffer, Session::rr_mapping_prefix()); - DEBUG_ASSERT(path_start && - "Passed something to create_shared_mmap that" - " wasn't a mapping shared between rr and the tracee?"); - size_t prefix_len = path_start - name_buffer; - buffer_size -= prefix_len; - name_buffer += prefix_len; - - char* name_end = name_buffer + strnlen(name_buffer, buffer_size); - char* name_start = name_buffer + strlen(Session::rr_mapping_prefix()); - int hyphens_seen = 0; - while (name_end > name_start) { - --name_end; - if (*name_end == '-') { - ++hyphens_seen; - } else if (*name_end == '/') { - DEBUG_ASSERT(false && - "Passed something to create_shared_mmap that" - " wasn't a mapping shared between rr and the tracee?"); - } - if (hyphens_seen == 2) { - break; - } - } - DEBUG_ASSERT(hyphens_seen == 2); - *name_end = '\0'; - return name_start; -} - const AddressSpace::Mapping Session::recreate_shared_mmap( AutoRemoteSyscalls& remote, const AddressSpace::Mapping& m, PreserveContents preserve, MonitoredSharedMemory::shr_ptr monitored) { diff --git a/src/Session.h b/src/Session.h index 367d6c8c341..be61440ed76 100644 --- a/src/Session.h +++ b/src/Session.h @@ -29,6 +29,17 @@ class Task; class ThreadGroup; class AutoRemoteSyscalls; +struct CloneCompletion { + struct AddressSpaceClone { + Task* clone_leader; + Task::CapturedState clone_leader_state; + std::vector member_states; + std::vector, std::vector>> captured_memory; + }; + std::vector address_spaces; + Task::ClonedFdTables cloned_fd_tables; +}; + // The following types are used by step() APIs in Session subclasses. /** @@ -230,6 +241,8 @@ class Session { uint32_t next_task_serial() { return next_task_serial_++; } + uint32_t current_task_serial() const { return next_task_serial_; } + /** * Return the task created with |rec_tid|, or nullptr if no such * task exists. @@ -443,8 +456,6 @@ class Session { void copy_state_to(Session& dest, EmuFs& emu_fs, EmuFs& dest_emu_fs); - // XXX Move CloneCompletion/CaptureState etc to ReplayTask/ReplaySession - struct CloneCompletion; // Call this before doing anything that requires access to the full set // of tasks (i.e., almost anything!). Not really const! void finish_initializing() const; diff --git a/src/StdioMonitor.cc b/src/StdioMonitor.cc index 64cba35830f..738c9685821 100644 --- a/src/StdioMonitor.cc +++ b/src/StdioMonitor.cc @@ -38,4 +38,8 @@ void StdioMonitor::did_write(Task* t, const std::vector& ranges, } } +void StdioMonitor::serialize_type(pcp::FileMonitor::Builder& builder) const noexcept { + builder.setStdio(original_fd); +} + } // namespace rr diff --git a/src/StdioMonitor.h b/src/StdioMonitor.h index 4f67fc9f3ae..eeb1431095d 100644 --- a/src/StdioMonitor.h +++ b/src/StdioMonitor.h @@ -22,7 +22,7 @@ class StdioMonitor : public FileMonitor { */ StdioMonitor(int original_fd) : original_fd(original_fd) {} - virtual Type type() override { return Stdio; } + virtual Type type() const override { return Stdio; } /** * Make writes to stdout/stderr blocking, to avoid nondeterminism in the @@ -43,7 +43,9 @@ class StdioMonitor : public FileMonitor { virtual void did_write(Task* t, const std::vector& ranges, LazyOffset&) override; + private: + void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override; int original_fd; }; diff --git a/src/SysCpuMonitor.h b/src/SysCpuMonitor.h index 98546aaaaa9..f780525bdd0 100644 --- a/src/SysCpuMonitor.h +++ b/src/SysCpuMonitor.h @@ -17,7 +17,7 @@ class SysCpuMonitor : public FileMonitor { public: SysCpuMonitor(Task* t, const std::string& pathname); - virtual Type type() override { return SysCpu; } + virtual Type type() const override { return SysCpu; } bool emulate_read(RecordTask* t, const std::vector& ranges, LazyOffset&, uint64_t* result) override; diff --git a/src/Task.cc b/src/Task.cc index 5ffae6915ea..e63ce7bd9d6 100644 --- a/src/Task.cc +++ b/src/Task.cc @@ -365,6 +365,12 @@ std::string Task::proc_exe_path() { return path; } +std::string Task::proc_mem_path() const { + char path[PATH_MAX]; + snprintf(path, sizeof(path) - 1, "/proc/%d/mem", tid); + return path; +} + std::string Task::exe_path() { char proc_exe[PATH_MAX]; snprintf(proc_exe, sizeof(proc_exe), "/proc/%d/exe", tid); diff --git a/src/Task.h b/src/Task.h index 0705cb98404..80163d6bf3a 100644 --- a/src/Task.h +++ b/src/Task.h @@ -223,6 +223,11 @@ class Task { */ std::string proc_exe_path(); + /** + * Return the path of /proc//mem + */ + std::string proc_mem_path() const; + /** * Return the path of the executable (i.e. what * /proc//exe points to). @@ -457,6 +462,10 @@ class Task { */ virtual std::string name() const; + /** + * Sets the OS-name of this task by injecting system call for PR_SET_NAME. + * Also updates |prname| to |name|. + */ virtual void set_name(AutoRemoteSyscalls& remote, const std::string& name); /** diff --git a/src/TraceStream.cc b/src/TraceStream.cc index 4b3d67d2a40..51b0ef53cda 100644 --- a/src/TraceStream.cc +++ b/src/TraceStream.cc @@ -80,78 +80,6 @@ static TraceStream::Substream operator++(TraceStream::Substream& s) { return s; } -static bool dir_exists(const string& dir) { - struct stat dummy; - return !dir.empty() && stat(dir.c_str(), &dummy) == 0; -} - -static string default_rr_trace_dir() { - static string cached_dir; - - if (!cached_dir.empty()) { - return cached_dir; - } - - string dot_dir; - const char* home = getenv("HOME"); - if (home) { - dot_dir = string(home) + "/.rr"; - } - string xdg_dir; - const char* xdg_data_home = getenv("XDG_DATA_HOME"); - if (xdg_data_home) { - xdg_dir = string(xdg_data_home) + "/rr"; - } else if (home) { - xdg_dir = string(home) + "/.local/share/rr"; - } - - // If XDG dir does not exist but ~/.rr does, prefer ~/.rr for backwards - // compatibility. - if (dir_exists(xdg_dir)) { - cached_dir = xdg_dir; - } else if (dir_exists(dot_dir)) { - cached_dir = dot_dir; - } else if (!xdg_dir.empty()) { - cached_dir = xdg_dir; - } else { - cached_dir = "/tmp/rr"; - } - - return cached_dir; -} - -string trace_save_dir() { - const char* output_dir = getenv("_RR_TRACE_DIR"); - return output_dir ? output_dir : default_rr_trace_dir(); -} - -string latest_trace_symlink() { - return trace_save_dir() + "/latest-trace"; -} - -string resolve_trace_name(const string& trace_name) -{ - if (trace_name.empty()) { - return latest_trace_symlink(); - } - - // Single-component paths are looked up first in the current directory, next - // in the default trace dir. - - if (trace_name.find('/') == string::npos) { - if (dir_exists(trace_name)) { - return trace_name; - } - - string resolved_trace_name = trace_save_dir() + "/" + trace_name; - if (dir_exists(resolved_trace_name)) { - return resolved_trace_name; - } - } - - return trace_name; -} - class CompressedWriterOutputStream : public kj::OutputStream { public: CompressedWriterOutputStream(CompressedWriter& writer) : writer(writer) {} @@ -230,35 +158,6 @@ bool TraceReader::good() const { return true; } -static kj::ArrayPtr str_to_data(const string& str) { - return kj::ArrayPtr( - reinterpret_cast(str.data()), str.size()); -} - -static string data_to_str(const kj::ArrayPtr& data) { - if (!data.begin()) { - return string(); - } - if (memchr(data.begin(), 0, data.size())) { - FATAL() << "Invalid string: contains null character"; - } - return string(reinterpret_cast(data.begin()), data.size()); -} - -static trace::Arch to_trace_arch(SupportedArch arch) { - switch (arch) { - case x86: - return trace::Arch::X86; - case x86_64: - return trace::Arch::X8664; - case aarch64: - return trace::Arch::AARCH64; - default: - FATAL() << "Unknown arch"; - return trace::Arch::X86; - } -} - static trace::CpuTriState to_tristate(bool value) { return value ? trace::CpuTriState::KNOWN_TRUE : trace::CpuTriState::KNOWN_FALSE; } @@ -1780,4 +1679,51 @@ uint64_t TraceReader::xcr0() const { return (uint64_t(record->out.edx) << 32) | record->out.eax; } +// the dump command repurposed to a `forward_to` API. +void TraceReader::forward_to(FrameTime next_event_to_start_consuming) { + const auto stop_at = next_event_to_start_consuming-1; + while (!at_end()) { + const auto frame = read_frame(); + // means the EVENTS stream is at the correct time, now RAW_DATA and MMAPS must catch up 1 "step" + if (frame.time() == stop_at) { + auto& mmaps = reader(MMAPS); + auto mmaps_pos_found = false; + while(!mmaps.at_end() && !mmaps_pos_found) { + // save state, if we find the MMAP record _after_ the one we're looking for + // we need to restore it to this point. + mmaps.save_state(); + CompressedReaderInputStream stream(mmaps); + PackedMessageReader map_msg(stream); + trace::MMap::Reader map = map_msg.getRoot(); + if (map.getFrameTime() > frame.time()) { + mmaps.restore_state(); + mmaps_pos_found = true; + } else { + mmaps.discard_state(); + } + } + + // consume RawData for frame (next_event_to_start_at - 1) + TraceReader::RawDataMetadata data; + TraceReader::RawData raw; + while (read_raw_data_metadata_for_frame(data)) { + read_raw_data_for_frame(raw); + } + return; + } else { + while (true) { + TraceReader::MappedData data; + bool found; + KernelMapping km = read_mapped_region(&data, &found, TraceReader::DONT_VALIDATE, TimeConstraint::CURRENT_TIME_ONLY); + if (!found) { + break; + } + } + TraceReader::RawDataMetadata data; + while (read_raw_data_metadata_for_frame(data)) { } + } + } + FATAL() << "Could not forward stream(s) to event " << next_event_to_start_consuming; +} + } // namespace rr diff --git a/src/TraceStream.h b/src/TraceStream.h index f7a0306651a..1e2628fe89b 100644 --- a/src/TraceStream.h +++ b/src/TraceStream.h @@ -503,6 +503,13 @@ class TraceReader : public TraceStream { const TraceUtsName& uname() const { return uname_; } + /** + * Forwards this reader up until `event_number` (so that the next call to .read_frame() gives that event) + * This also forwards mmaps and raw_data streams, but leaves task event stream as is, as this can be read + * "arbitrarily" as it contains time information in each entry. + */ + void forward_to(FrameTime event_number); + private: CompressedReader& reader(Substream s) { return *readers[s]; } const CompressedReader& reader(Substream s) const { return *readers[s]; } diff --git a/src/VirtualPerfCounterMonitor.cc b/src/VirtualPerfCounterMonitor.cc index 1364688f615..778edd9317b 100644 --- a/src/VirtualPerfCounterMonitor.cc +++ b/src/VirtualPerfCounterMonitor.cc @@ -168,4 +168,8 @@ VirtualPerfCounterMonitor::interrupting_virtual_pmc_for_task(Task* t) { return found->second; } +void VirtualPerfCounterMonitor::serialize_type(pcp::FileMonitor::Builder&) const noexcept { + FATAL() << "VirtualPerCounter not implemented or supported"; +} + } // namespace rr diff --git a/src/VirtualPerfCounterMonitor.h b/src/VirtualPerfCounterMonitor.h index cdd85e8d3f4..c0c49237dbe 100644 --- a/src/VirtualPerfCounterMonitor.h +++ b/src/VirtualPerfCounterMonitor.h @@ -23,7 +23,7 @@ class VirtualPerfCounterMonitor : public FileMonitor { const struct perf_event_attr& attr); virtual ~VirtualPerfCounterMonitor(); - virtual Type type() override { return VirtualPerfCounter; } + virtual Type type() const override { return VirtualPerfCounter; } virtual bool emulate_ioctl(RecordTask* t, uint64_t* result) override; virtual bool emulate_fcntl(RecordTask* t, uint64_t* result) override; @@ -44,6 +44,7 @@ class VirtualPerfCounterMonitor : public FileMonitor { static VirtualPerfCounterMonitor* interrupting_virtual_pmc_for_task(Task* t); private: + virtual void serialize_type(pcp::FileMonitor::Builder&) const noexcept override; void maybe_enable_interrupt(Task* t, uint64_t after); void disable_interrupt() const; diff --git a/src/rr_pcp.capnp b/src/rr_pcp.capnp new file mode 100644 index 00000000000..72e076cfa67 --- /dev/null +++ b/src/rr_pcp.capnp @@ -0,0 +1,195 @@ +# rr ReplaySession schema + +@0xf55676ebd869d6c1; + +using Cxx = import "/capnp/c++.capnp"; +$Cxx.namespace("rr::pcp"); + +using import "rr_trace.capnp".Registers; +using import "rr_trace.capnp".ExtraRegisters; +using import "rr_trace.capnp".Arch; +using import "rr_trace.capnp".RemoteFd; +using import "rr_trace.capnp".CString; +using import "rr_trace.capnp".Device; +using import "rr_trace.capnp".Inode; +using import "rr_trace.capnp".RemotePtr; +using import "rr_trace.capnp".FrameTime; +using import "rr_trace.capnp".Tid; +using import "rr_trace.capnp".Fd; +using import "rr_trace.capnp".Path; +using import "rr_trace.capnp".Ticks; + +struct ExtendedTaskId { + groupId @0 :Tid; + groupSerial @1: UInt32; + taskId @2 :Tid; + taskSerial @3: UInt32; +} + +using FileMonitorType = Int32; +struct FileMonitor { + fd @0 :Fd; + type @1 :FileMonitorType; + union { + mmap :group { + dead @2 :Bool; + device @3 :Device; + inode @4 :Inode; + } + procFd :group { + tid @5 :Tid; + serial @6 :UInt32; + } + procMem :group { + tid @7 :Tid; + serial @8 :UInt32; + execCount @9 :UInt32; + } + stdio @10 :Fd; + procStat @11 :Data; + bpf :group { + keySize @12: UInt64; + valueSize @13 :UInt64; + } + } +} + +struct KernelMapping { + start @0 :RemotePtr; + end @1 :RemotePtr; + fsname @2 :CString; + device @3 :Device; + inode @4 :Inode; + protection @5 :Int32; + flags @6 :Int32; + offset @7 :UInt64; + mapType :union { + file :group { # mapping of a file + contentsPath @8 :Path; + } + guardSegment @9 :Void; # Empty map segment, PROT NONE, no pages in physical memory, no fsname + # Mapping types below can all be compressed, as they need to be copied into the mapping anyhow + sharedAnon :group { + contentsPath @10 :Path; + isSysVSegment @11 :Bool; # if we're a SysV, we need to set AddressSpace::shm_sizes[start] = size; + } + privateAnon :group { # e.g. stack, heap, etc + contentsPath @12 :Path; + } + syscallBuffer :group { + contentsPath @13 :Path; + } + rrPage :group { + contentsPath @14 :Path; + } + } +} + +# For lack of a better name. +struct ProcessSpace { + virtualAddressSpace @0 :List(KernelMapping); + breakpointFaultAddress @1 :RemotePtr; + exe @2 :Data; # actual binary image exec'ed. + originalExe @3 :Data; # original binary image executed during record + monitors @4 :List(FileMonitor); + taskFirstRunEvent @5 :FrameTime; + vmFirstRunEvent @6 :FrameTime; +} + +struct CapturedState { + ticks @0 :Ticks; + regs @1 :Registers; + extraRegs @2 :ExtraRegisters; + prname @3 :Data; + fdtableIdentity @4 :UInt64; + syscallbufChild @5 :RemotePtr; + syscallbufSize @6 :UInt64; + numSyscallbufBytes @7 :UInt64; + preloadGlobals @8 :RemotePtr; + scratchPtr @9 :RemotePtr; + scratchSize @10 :UInt64; + topOfStack @11 :RemotePtr; + rseqState :group { + ptr @12 :RemotePtr; + abortPrefixSignature @13 :UInt32; + } + clonedFileDataOffset @14 :UInt64; + threadLocals @15 :Data; + recTid @16 :Tid; + ownNamespaceRecTid @17 :Tid; + serial @18 :UInt32; + tguid :group { + tid @19 :Tid; + serial @20 :UInt32; + } + deschedFdChild @21 :Int32; + clonedFileDataFdChild @22 :Int32; + clonedFileDataFname @23 :Data; + waitStatus @24 :Int32; + tlsRegister @25 :UInt64; + threadAreas @26 :List(Data); # std::vector +} + +struct CapturedMemory { + startAddress @0 :RemotePtr; + data @1 :Data; +} + +struct AddressSpaceClone { + processSpace @0 :ProcessSpace; + cloneLeaderState @1 :CapturedState; + memberState @2 :List(CapturedState); + capturedMemory @3 :List(CapturedMemory); + auxv @4 :Data; +} + +struct CloneCompletionInfo { + addressSpaces @0 :List(AddressSpaceClone); + sessionCurrentStep @1: Data; + lastSigInfo @2 :Data; + usesSyscallBuffering @3 :Bool; +} + +# Marks are kind of tricky to represents as serialized data, but this amounts to +# a flattened Mark / InternalMark / ProtoMark +struct MarkData { + time @0 :FrameTime; + ticks @1 :Ticks; + ticksAtEventStart @2 :Ticks; + stepKey @3 :Int32; + regs @4: Registers; + returnAddresses @5 :List(RemotePtr); + extraRegs @6: ExtraRegisters; + singlestepToNextMarkNoSignal @7 :Bool; +} + + +# Information about an explicit "GDB Checkpoint" +struct CheckpointInfo { + # points to CloneCompletionInfo header file, containing the CloneCompletionInfo + cloneCompletionFile @0 :Path; + id @1 :UInt64; + lastContinueTask @2 :ExtendedTaskId; + where @3 :Data; + nextSerial @4 :UInt32; # next_serial_ value in Session. + union { + nonExplicit :group { + # The mark which has the actual clone data we have serialized + cloneMark @5 :MarkData; + # The actual mark for the checkpoint, to which we replay-seek-to + checkpointMark @6 :MarkData; + } + explicit @7 :MarkData; + } + # we need this data, to determine Progress, to be able to use them as reverse-exec + statistics :group { + bytesWritten @8 :UInt64; + ticksProcessed @9 :Ticks; + syscallsPerformed @10 :UInt32; + } +} + +# The checkpoint index file +struct PersistentCheckpoints { + checkpoints @0 :List(CheckpointInfo); +} \ No newline at end of file diff --git a/src/util.cc b/src/util.cc index 854b9154334..08ceab1082d 100644 --- a/src/util.cc +++ b/src/util.cc @@ -2585,4 +2585,170 @@ void base_name(string& s) { } } +char* extract_name(char* name_buffer, size_t buffer_size) { + // Recover the name that was originally chosen by finding the part of the + // name between rr_mapping_prefix and the -%d-%d at the end. + char* path_start = strstr(name_buffer, Session::rr_mapping_prefix()); + DEBUG_ASSERT(path_start && + "Passed something to create_shared_mmap that" + " wasn't a mapping shared between rr and the tracee?"); + size_t prefix_len = path_start - name_buffer; + buffer_size -= prefix_len; + name_buffer += prefix_len; + + char* name_end = name_buffer + strnlen(name_buffer, buffer_size); + char* name_start = name_buffer + strlen(Session::rr_mapping_prefix()); + int hyphens_seen = 0; + while (name_end > name_start) { + --name_end; + if (*name_end == '-') { + ++hyphens_seen; + } else if (*name_end == '/') { + DEBUG_ASSERT(false && + "Passed something to create_shared_mmap that" + " wasn't a mapping shared between rr and the tracee?"); + } + if (hyphens_seen == 2) { + break; + } + } + DEBUG_ASSERT(hyphens_seen == 2); + *name_end = '\0'; + return name_start; +} + +static bool dir_exists(const std::string& dir) { + struct stat dummy; + return !dir.empty() && stat(dir.c_str(), &dummy) == 0; +} + +std::string latest_trace_symlink() { + return trace_save_dir() + "/latest-trace"; +} + +std::string trace_save_dir() { + const char* output_dir = getenv("_RR_TRACE_DIR"); + return output_dir ? output_dir : default_rr_trace_dir(); +} + +std::string resolve_trace_name(const std::string& trace_name) { + if (trace_name.empty()) { + return latest_trace_symlink(); + } + + // Single-component paths are looked up first in the current directory, next + // in the default trace dir. + + if (trace_name.find('/') == std::string::npos) { + if (dir_exists(trace_name)) { + return trace_name; + } + + std::string resolved_trace_name = trace_save_dir() + "/" + trace_name; + if (dir_exists(resolved_trace_name)) { + return resolved_trace_name; + } + } + + return trace_name; +} + +std::string default_rr_trace_dir() { + static std::string cached_dir; + + if (!cached_dir.empty()) { + return cached_dir; + } + + std::string dot_dir; + const char* home = getenv("HOME"); + if (home) { + dot_dir = std::string(home) + "/.rr"; + } + std::string xdg_dir; + const char* xdg_data_home = getenv("XDG_DATA_HOME"); + if (xdg_data_home) { + xdg_dir = std::string(xdg_data_home) + "/rr"; + } else if (home) { + xdg_dir = std::string(home) + "/.local/share/rr"; + } + + // If XDG dir does not exist but ~/.rr does, prefer ~/.rr for backwards + // compatibility. + if (dir_exists(xdg_dir)) { + cached_dir = xdg_dir; + } else if (dir_exists(dot_dir)) { + cached_dir = dot_dir; + } else if (!xdg_dir.empty()) { + cached_dir = xdg_dir; + } else { + cached_dir = "/tmp/rr"; + } + + return cached_dir; +} + +trace::Arch to_trace_arch(SupportedArch arch) { + switch (arch) { + case x86: + return trace::Arch::X86; + case x86_64: + return trace::Arch::X8664; + case aarch64: + return trace::Arch::AARCH64; + default: + FATAL() << "Unknown arch"; + return trace::Arch::X86; + } +} + +capnp::Data::Reader regs_to_raw(const Registers& regs) { + DEBUG_ASSERT(regs.get_ptrace_for_self_arch().size == sizeof(NativeArch::user_regs_struct)); + return { regs.get_ptrace_for_self_arch().data, + regs.get_ptrace_for_self_arch().size }; +} + +kj::ArrayPtr str_to_data(const std::string& str) { + return kj::ArrayPtr( + reinterpret_cast(str.data()), str.size()); +} + +// XXX move to trace_utils +capnp::Data::Reader extra_regs_to_raw(const ExtraRegisters& regs) { + return { regs.data_bytes(), static_cast(regs.data_size()) }; +}; + +std::string data_to_str(const kj::ArrayPtr& data) { + if (memchr(data.begin(), 0, data.size())) { + FATAL() << "Invalid string: contains null character"; + } + return std::string(reinterpret_cast(data.begin()), data.size()); +} + +void set_extra_regs_from_raw(SupportedArch arch, + const std::vector& records, + capnp::Data::Reader& raw, ExtraRegisters& out) { + if (raw.size()) { + ExtraRegisters::Format fmt; + switch (arch) { + default: + FATAL() << "Unknown architecture"; + RR_FALLTHROUGH; + case x86: + case x86_64: + fmt = ExtraRegisters::XSAVE; + break; + case aarch64: + fmt = ExtraRegisters::NT_FPR; + break; + } + auto success = out.set_to_raw_data(arch, fmt, raw.begin(), raw.size(), + xsave_layout_from_trace(records)); + if (!success) { + FATAL() << "Invalid extended register data in trace"; + } + } else { + out = ExtraRegisters(arch); + } +} } // namespace rr diff --git a/src/util.h b/src/util.h index 9a065d37aa4..f3b7fca52a7 100644 --- a/src/util.h +++ b/src/util.h @@ -22,6 +22,8 @@ #include "TraceFrame.h" #include "remote_ptr.h" #include "kernel_supplement.h" +#include +#include "rr_trace.capnp.h" /* This is pretty arbitrary. On Linux SIGPWR is sent to PID 1 (init) on * power failure, and it's unlikely rr will be recording that. @@ -664,6 +666,35 @@ void replace_in_buffer(MemoryRange src, const uint8_t* src_data, // Strip any directory part from the filename `s` void base_name(std::string& s); +char* extract_name(char* name_buffer, size_t buffer_size); + +std::string default_rr_trace_dir(); + +std::string resolve_trace_name(const std::string& trace_name); + +std::string trace_save_dir(); + +std::string latest_trace_symlink(); + +/** Convert `Registers` to data blob used in capnp */ +capnp::Data::Reader regs_to_raw(const Registers&); + +/** Write `ExtraRegisters` using the data from data blob reader `raw` */ +void set_extra_regs_from_raw(SupportedArch arch, + const std::vector& records, + capnp::Data::Reader& raw, ExtraRegisters& out); + +/** Convert `ExtraRegisters` to data blob used in capnp. */ +capnp::Data::Reader extra_regs_to_raw(const ExtraRegisters&); + +trace::Arch to_trace_arch(SupportedArch arch); + +/** Convert rr's capnp string representation into std::string. */ +std::string data_to_str(const kj::ArrayPtr& data); + +/** Convert std::string into rr's capnp string representation. */ +kj::ArrayPtr str_to_data(const std::string& str); + } // namespace rr #endif /* RR_UTIL_H_ */