Skip to content

Commit

Permalink
Functionality that will be shared, moved from TraceStream.cc
Browse files Browse the repository at this point in the history
- Moved into util.cc
- Added forward_to to skip trace data to some arbitrary point in time

Getters required to expose data

We need to be able to expose this data so it can
be serialized.

Find original exe for ReplayTask

Digs out original executable image that this task was forked
from, or in the case of exec, exec'd on.

This is required for persistent checkpointing, so that the names in the
proc fs corresponds to a correct name at replay time (i.e. has the same
behavior/looks the same in proc fs as a normal replay). The thread name is
not what should be showing up in /proc/tid/comm, but the actual
executable. So we need to be able to find this "original exe" of the
task.

Check if Event is checkpointable

Required for the create checkpoints command, etc. to determine what
events in the trace are checkpointable, when not having a live session.

In future commits/PRs, remove the static function in ReplaySession.cc`
that does the same thing and use this member function on Event instead.

Additional proc fs query paths

Gets additional proc fs paths for a task, in this case
/mem. Required for persistent checkpointing to figure out
on how to handle mappings and what to serialize (and what not to
serialize).

Lifted CloneCompletion out of Session

The function extract_name will also be required for setting up syscall
buffer stuff in coming commits.

Getters/setters required for PCP

Need to be able to set this data when restoring an address space.

Persistent checkpointing

Added persistent checkpoint schema for capnproto rr_pcp.capnp,
as well a compile command for it in CMakeLists.txt, that works like
the other one (rr_trace.capnp)

CheckpointInfo and MarkData types works as intermediaries between a
serialized checkpoint and a deserialized "live" one. MarkData is used for
copying the contents of Mark, InternalMark, ProtoMark and it's various
data into, for serialization as well when deserializing, to reconstruct
those types.

The reasoning for adding MarkData is to not intrude in Mark/InternalMark/ProtoMark
interface and possibly break some guarantees or invariants they provide.
If something goes wrong now, it's constrained only to persistent
checkpointing not reconstituting a session properly.

GDB spawned by RR now has 2 additional commands, write-checkpoints, which
serializes any checkpoints set by the `checkpoint` command and
load-checkpoints.

Added the rr create-checkpoints command which create persistent checkpoints
on a specified interval, which it attempts to honor as closely as possible.

RerunCommand and ReplayCommand are now aware of PCPs.

Replay sessions get spawned from persistent checkpoints if they
exist on disk when using `-g <evt>` or when using `-f <pid>` and that
"task" was created some time after a persistent checkpoint.

Added the --ignore-pcp flag to these commands, which ignores pcps
and spawns sessions normally.

fixup for can_checkpoint_at

Restored comments, that existed in static function in ReplaySession.cc
Change all use of this to Event::can_checkpoint_at
Removed static can_checkpoint_at in ReplaySession.cc

Fix preferred include & unnecessary check for partial init

Since checkpoints are partially initialized, checking that they are is pointless.

Added cmake command looping over trace files per request by @khuey

remove init check of member variables.

Move extract_name from Session into util.h.

Removed stream_util, moved contents to util.h

make ignore-pcp not take up '-i'

Moved responsibility of de/ser into FdTable and FileMonitor

Deserializing and serializing an FdTable is now performed by the class itself instead of in a free function

FileMonitor has a public member function that is used for serialization.
Each derived type that requires special/additional logic, extends
the virtual member function serialize_type.

Remove skipMonitoringMappedFd

not necessary for serialization, as FdTable is separately restored.

Refactor task OS-name setting

Task::copy_state sets the OS name of a task in the same fashion that
persistent checkpointing sets name. Refactored this functionality into
Task::set_name.

Also removed the unnecessary `update_prname` from Task::copy_state.

update_prname is not a "write to tracee"-operation but a "read from tracee"-operation; and since
we already know what name we want to set Task::prname to, we skip this reading from the tracee
in Task::copy_state and just set it to the parameter passed in to Task::set_name

Add const qualifier

Fixes rr-debugger#3678

Refactor so that marks_with_checkpoints is just changed in one place, not arbitrarily access it. Ref counts had the same changes in a previous commit.

Fixes a bug for loaded persistent checkpoints where the re-created checkpoints did not get their reference counting correct.

This closes rr-debugger#3678

Changes required to rebase
  • Loading branch information
theIDinside committed Jun 21, 2024
1 parent 1628175 commit 4adec90
Show file tree
Hide file tree
Showing 58 changed files with 2,799 additions and 277 deletions.
35 changes: 24 additions & 11 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -539,17 +539,26 @@ endforeach(generated_file)

add_custom_target(Generated DEPENDS ${GENERATED_FILES})

add_custom_command(OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++"
"${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.h"
COMMAND capnp compile
"--src-prefix=${CMAKE_CURRENT_SOURCE_DIR}/src"
"-oc++:${CMAKE_CURRENT_BINARY_DIR}"
"${CMAKE_CURRENT_SOURCE_DIR}/src/rr_trace.capnp"
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/src/rr_trace.capnp")
set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++"
PROPERTIES GENERATED true)
set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.h"
PROPERTIES GENERATED true HEADER_FILE_ONLY true)

set(CAPNP_FILES
rr_trace
rr_pcp
)

# Compile capnproto files
foreach(capnp_file ${CAPNP_FILES})
add_custom_command(OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.c++"
"${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.h"
COMMAND capnp compile
"--src-prefix=${CMAKE_CURRENT_SOURCE_DIR}/src"
"-oc++:${CMAKE_CURRENT_BINARY_DIR}"
"${CMAKE_CURRENT_SOURCE_DIR}/src/${capnp_file}.capnp"
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/src/${capnp_file}.capnp")
set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.c++"
PROPERTIES GENERATED true)
set_source_files_properties("${CMAKE_CURRENT_BINARY_DIR}/${capnp_file}.capnp.h"
PROPERTIES GENERATED true HEADER_FILE_ONLY true)
endforeach()

if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "aarch64")
set(BLAKE_ARCH_DIR third-party/blake2/neon)
Expand All @@ -561,11 +570,13 @@ set(RR_SOURCES
src/AddressSpace.cc
src/AutoRemoteSyscalls.cc
src/BuildidCommand.cc
src/CheckpointInfo.cc
src/Command.cc
src/CompressedReader.cc
src/CompressedWriter.cc
src/CPUFeaturesCommand.cc
src/CPUIDBugDetector.cc
src/CreateCheckpointsCommand.cc
src/DiversionSession.cc
src/DumpCommand.cc
src/Dwarf.cc
Expand Down Expand Up @@ -602,6 +613,7 @@ set(RR_SOURCES
src/MvCommand.cc
src/PackCommand.cc
src/PerfCounters.cc
src/PersistentCheckpointing.cc
src/PidFdMonitor.cc
src/processor_trace_check.cc
src/ProcFdDirMonitor.cc
Expand Down Expand Up @@ -640,6 +652,7 @@ set(RR_SOURCES
src/WaitManager.cc
src/WaitStatus.cc
${CMAKE_CURRENT_BINARY_DIR}/rr_trace.capnp.c++
${CMAKE_CURRENT_BINARY_DIR}/rr_pcp.capnp.c++
${BLAKE_ARCH_DIR}/blake2b.c
)

Expand Down
5 changes: 5 additions & 0 deletions src/AddressSpace.cc
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,11 @@ void AddressSpace::save_auxv(Task* t) {
save_interpreter_base(t, saved_auxv());
}

void AddressSpace::restore_auxv(Task* t, std::vector<uint8_t>&& auxv) {
saved_auxv_ = std::move(auxv);
save_interpreter_base(t, saved_auxv());
}

void AddressSpace::save_interpreter_base(Task* t, std::vector<uint8_t> auxv) {
saved_interpreter_base_ = read_interpreter_base(auxv);
save_ld_path(t, saved_interpreter_base());
Expand Down
20 changes: 20 additions & 0 deletions src/AddressSpace.h
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,14 @@ class AddressSpace : public HasTaskSet {
* Dies if no shm size is registered for the address.
*/
size_t get_shm_size(remote_ptr<void> addr) { return shm_sizes[addr]; }

/**
* Check if `map` is shared memory
*/
bool has_shm_at(const KernelMapping& map) const {
return shm_sizes.find(map.start()) != std::cend(shm_sizes);
}

void remove_shm_size(remote_ptr<void> addr) { shm_sizes.erase(addr); }

/**
Expand Down Expand Up @@ -793,6 +801,9 @@ class AddressSpace : public HasTaskSet {
const std::vector<uint8_t>& saved_auxv() { return saved_auxv_; }
void save_auxv(Task* t);

/* Used when restoring persistent checkpoints. */
void restore_auxv(Task* t, std::vector<uint8_t>&& auxv);

remote_ptr<void> saved_interpreter_base() { return saved_interpreter_base_; }
void save_interpreter_base(Task* t, std::vector<uint8_t> auxv);

Expand Down Expand Up @@ -871,6 +882,15 @@ class AddressSpace : public HasTaskSet {

bool legacy_breakpoint_mode() { return stopping_breakpoint_table_ != nullptr; }
remote_code_ptr do_breakpoint_fault_addr() { return do_breakpoint_fault_addr_; }

void set_breakpoint_fault_addr(remote_code_ptr addr) {
do_breakpoint_fault_addr_ = addr;
}

void set_uses_syscall_buffer(bool uses_syscall_buffer = true) {
syscallbuf_enabled_ = uses_syscall_buffer;
}

remote_code_ptr stopping_breakpoint_table() { return stopping_breakpoint_table_; }
int stopping_breakpoint_table_entry_size() { return stopping_breakpoint_table_entry_size_; }

Expand Down
8 changes: 7 additions & 1 deletion src/BpfMapMonitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,18 @@ class BpfMapMonitor : public FileMonitor {
public:
BpfMapMonitor(uint64_t key_size, uint64_t value_size) : key_size_(key_size), value_size_(value_size) {}

virtual Type type() override { return BpfMap; }
virtual Type type() const override { return BpfMap; }

uint64_t key_size() const { return key_size_; }
uint64_t value_size() const { return value_size_; }

private:
virtual void serialize_type(pcp::FileMonitor::Builder& builder) const noexcept override {
auto bpf = builder.initBpf();
bpf.setKeySize(key_size_);
bpf.setValueSize(value_size_);
}

uint64_t key_size_;
uint64_t value_size_;
};
Expand Down
230 changes: 230 additions & 0 deletions src/CheckpointInfo.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
#include "CheckpointInfo.h"
#include "GdbServerConnection.h"
#include "ReplayTimeline.h"
#include "ScopedFd.h"
#include <algorithm>
#include <capnp/blob.h>
#include <capnp/message.h>
#include <capnp/serialize-packed.h>
#include <cstddef>

namespace rr {

MarkData::MarkData(const ReplayTimeline::Mark& m)
: time(m.get_key().trace_time),
ticks(m.get_key().ticks),
step_key(m.get_key().step_key.as_int()),
ticks_at_event_start(m.get_internal()->ticks_at_event_start),
regs(m.regs()),
extra_regs(m.extra_regs()),
return_addresses(m.get_internal()->proto.return_addresses),
singlestep_to_next_mark_no_signal(
m.get_internal()->singlestep_to_next_mark_no_signal) {}

MarkData::MarkData(rr::pcp::MarkData::Reader reader, SupportedArch arch,
const CPUIDRecords& cpuid_recs)
: time(reader.getTime()),
ticks(reader.getTicks()),
step_key(reader.getStepKey()),
ticks_at_event_start(reader.getTicksAtEventStart()),
regs(),
extra_regs(),
return_addresses(),
singlestep_to_next_mark_no_signal(
reader.getSinglestepToNextMarkNoSignal()) {
regs.set_arch(arch);
regs.set_from_trace(arch, reader.getRegs().getRaw().begin(),
reader.getRegs().getRaw().size());
auto eregs = reader.getExtraRegs().getRaw();
set_extra_regs_from_raw(arch, cpuid_recs, eregs, extra_regs);
auto i = 0;
for (auto rs : reader.getReturnAddresses()) {
return_addresses.addresses[i++] = rs;
}
}

std::vector<CheckpointInfo> get_checkpoint_infos(const std::string& trace_dir, SupportedArch arch, const CPUIDRecords& cpuid_recs) {
// the trace's main checkpoint file, containing the list of all persistent
// checkpoints.
const auto path = checkpoints_index_file(trace_dir);
ScopedFd fd(path.c_str(), O_RDONLY);
std::vector<CheckpointInfo> checkpoints;
if (!fd.is_open()) {
return checkpoints;
}

capnp::PackedFdMessageReader reader(fd);
auto checkpointsInfoReader = reader.getRoot<pcp::PersistentCheckpoints>();
auto cps = checkpointsInfoReader.getCheckpoints();
for (const auto& cp : cps) {
auto info = CheckpointInfo{ cp, arch, cpuid_recs };
if (info.exists_on_disk()) {
checkpoints.push_back(info);
}
}
std::sort(checkpoints.begin(), checkpoints.end(),
[](CheckpointInfo& a, CheckpointInfo& b) {
return a.clone_data.time <= b.clone_data.time;
});
return checkpoints;
}

bool CheckpointInfo::exists_on_disk() const {
struct stat buf;
return stat(capnp_cp_file.c_str(), &buf) == 0 &&
stat((capnp_cp_file + std::to_string(clone_data.time)).c_str(), &buf) == 0;
}

CheckpointInfo::CheckpointInfo(const Checkpoint& c)
: unique_id(CheckpointInfo::generate_unique_id(c.unique_id)),
last_continue_task(c.last_continue_task),
where(c.where),
clone_data(c.mark),
non_explicit_mark_data(nullptr)
{
DEBUG_ASSERT(c.is_explicit == Checkpoint::EXPLICIT && c.mark.has_rr_checkpoint());
// can't assert before ctor, set these values here.
next_serial = c.mark.get_checkpoint()->current_task_serial();
stats = c.mark.get_checkpoint()->statistics();
LOG(debug) << "checkpoint clone at " << clone_data.time
<< "; GDB checkpoint at " << clone_data.time;
capnp_cp_file = c.mark.get_checkpoint()->trace_reader().dir() +
"/checkpoint-" + std::to_string(unique_id);
}

CheckpointInfo::CheckpointInfo(ExtendedTaskId last_continue,
const ReplayTimeline::Mark& mark_with_checkpoint)
: unique_id(CheckpointInfo::generate_unique_id()),
last_continue_task(last_continue),
where("Unknown"),
next_serial(mark_with_checkpoint.get_checkpoint()->current_task_serial()),
clone_data(mark_with_checkpoint),
non_explicit_mark_data(nullptr),
stats(mark_with_checkpoint.get_checkpoint()->statistics())
{
LOG(debug) << "checkpoint clone at " << clone_data.time
<< "; GDB checkpoint at " << clone_data.time;
capnp_cp_file = mark_with_checkpoint.get_checkpoint()->trace_reader().dir() +
"/checkpoint-" + std::to_string(unique_id);
}

CheckpointInfo::CheckpointInfo(const Checkpoint& non_explicit_cp,
const ReplayTimeline::Mark& mark_with_clone)
: unique_id(CheckpointInfo::generate_unique_id(non_explicit_cp.unique_id)),
last_continue_task(non_explicit_cp.last_continue_task),
where(non_explicit_cp.where),
next_serial(
mark_with_clone.get_checkpoint()->current_task_serial()),
clone_data(mark_with_clone),
non_explicit_mark_data(new MarkData{ non_explicit_cp.mark }),
stats(mark_with_clone.get_checkpoint()->statistics()) {
DEBUG_ASSERT(non_explicit_cp.is_explicit == Checkpoint::NOT_EXPLICIT &&
!non_explicit_cp.mark.has_rr_checkpoint() &&
"Constructor meant for non explicit checkpoints");
// XXX we give this checkpoint the id (and name/path) of the actual cloned session
// data, so that multiple non explicit checkpoints later on, can reference the
// same clone data (not yet implemented)
LOG(debug) << "checkpoint clone at " << clone_data.time << "; GDB checkpoint at " << non_explicit_mark_data->time;
capnp_cp_file = mark_with_clone.get_checkpoint()->trace_reader().dir() +
"/checkpoint-" + std::to_string(unique_id);
}

CheckpointInfo::CheckpointInfo(rr::pcp::CheckpointInfo::Reader reader,
SupportedArch arch,
const CPUIDRecords& cpuid_recs)
: capnp_cp_file(data_to_str(reader.getCloneCompletionFile())),
unique_id(reader.getId()),
where(data_to_str(reader.getWhere())),
next_serial(reader.getNextSerial()),
clone_data(reader.isExplicit() ? reader.getExplicit()
: reader.getNonExplicit().getCloneMark(),
arch, cpuid_recs),
non_explicit_mark_data(
reader.isNonExplicit()
? new MarkData{ reader.getNonExplicit().getCheckpointMark(), arch,
cpuid_recs }
: nullptr),
stats() {
auto t = reader.getLastContinueTask();
last_continue_task = ExtendedTaskId{{t.getGroupId(), t.getGroupSerial()}, {t.getTaskId(), t.getTaskSerial()}};
auto s = reader.getStatistics();
stats.bytes_written = s.getBytesWritten();
stats.syscalls_performed = s.getSyscallsPerformed();
stats.ticks_processed = s.getTicksProcessed();
}

void CheckpointInfo::delete_from_disk() {
const auto remove_file = [](auto path_data) {
const auto path = data_to_str(path_data);
if (remove(path.c_str()) != 0) {
LOG(error) << "Failed to remove " << path;
}
};
ScopedFd fd(capnp_cp_file.c_str(), O_RDONLY);
capnp::PackedFdMessageReader datum(fd);
pcp::CloneCompletionInfo::Reader cc_reader =
datum.getRoot<pcp::CloneCompletionInfo>();
const auto addr_spaces = cc_reader.getAddressSpaces();
for (const auto& as : addr_spaces) {
const auto mappings_data = as.getProcessSpace().getVirtualAddressSpace();
for (const auto& m : mappings_data) {
switch (m.getMapType().which()) {
case pcp::KernelMapping::MapType::FILE:
remove_file(m.getMapType().getFile().getContentsPath());
break;
case pcp::KernelMapping::MapType::SHARED_ANON:
remove_file(m.getMapType().getSharedAnon().getContentsPath());
break;
case pcp::KernelMapping::MapType::PRIVATE_ANON:
remove_file(m.getMapType().getPrivateAnon().getContentsPath());
break;
case pcp::KernelMapping::MapType::GUARD_SEGMENT:
break;
case pcp::KernelMapping::MapType::SYSCALL_BUFFER:
remove_file(m.getMapType().getSyscallBuffer().getContentsPath());
break;
case pcp::KernelMapping::MapType::RR_PAGE:
remove_file(m.getMapType().getRrPage().getContentsPath());
break;
}
}
}

remove(capnp_cp_file.c_str());
remove(data_directory().c_str());
if (exists_on_disk()) {
LOG(error) << "Couldn't remove persistent checkpoint data (or directory)";
}
}

ScopedFd CheckpointInfo::open_for_read() const {
DEBUG_ASSERT(exists_on_disk() && "This checkpoint has not been serialized; or the index file has been removed.");
auto file = ScopedFd(capnp_cp_file.c_str(), O_RDONLY);
if (!file.is_open()) FATAL() << "Couldn't open checkpoint data " << file;
return file;
}

ScopedFd CheckpointInfo::open_for_write() const {
DEBUG_ASSERT(!exists_on_disk() && "Already serialized checkpoints shouldn't be re-written");
auto file = ScopedFd(capnp_cp_file.c_str(), O_EXCL | O_CREAT | O_RDWR, 0700);
if (!file.is_open()) FATAL() << "Couldn't open checkpoint file for writing " << file;
return file;
}

std::string CheckpointInfo::data_directory() const {
return capnp_cp_file + std::to_string(clone_data.time);
}

/*static*/ size_t CheckpointInfo::generate_unique_id(size_t id) {
// if we haven't been set already, generate a unique "random" id
if (id == 0) {
timeval t;
gettimeofday(&t, nullptr);
auto cp_id = (t.tv_sec * 1000 + t.tv_usec / 1000);
return cp_id;
} else {
return id;
}
}

} // namespace rr
Loading

0 comments on commit 4adec90

Please sign in to comment.