Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux ptrace plugin #1288

Merged
merged 7 commits into from
Oct 22, 2024
39 changes: 38 additions & 1 deletion volatility3/framework/constants/linux/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Linux-specific values that aren't found in debug symbols
"""
from enum import IntEnum
from enum import IntEnum, Flag

KERNEL_NAME = "__kernel__"

Expand Down Expand Up @@ -302,3 +302,40 @@ class ELF_CLASS(IntEnum):
ELFCLASSNONE = 0
ELFCLASS32 = 1
ELFCLASS64 = 2


PT_OPT_FLAG_SHIFT = 3

PTRACE_EVENT_FORK = 1
PTRACE_EVENT_VFORK = 2
PTRACE_EVENT_CLONE = 3
PTRACE_EVENT_EXEC = 4
PTRACE_EVENT_VFORK_DONE = 5
PTRACE_EVENT_EXIT = 6
PTRACE_EVENT_SECCOMP = 7

PTRACE_O_EXITKILL = 1 << 20
PTRACE_O_SUSPEND_SECCOMP = 1 << 21


class PT_FLAGS(Flag):
"PTrace flags"
PT_PTRACED = 0x00001
PT_SEIZED = 0x10000

PT_TRACESYSGOOD = 1 << (PT_OPT_FLAG_SHIFT + 0)
PT_TRACE_FORK = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_FORK)
PT_TRACE_VFORK = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_VFORK)
PT_TRACE_CLONE = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_CLONE)
PT_TRACE_EXEC = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_EXEC)
PT_TRACE_VFORK_DONE = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_VFORK_DONE)
PT_TRACE_EXIT = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_EXIT)
PT_TRACE_SECCOMP = 1 << (PT_OPT_FLAG_SHIFT + PTRACE_EVENT_SECCOMP)

PT_EXITKILL = PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT
PT_SUSPEND_SECCOMP = PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT

@property
def flags(self) -> str:
"""Returns the ptrace flags string"""
return str(self).replace(self.__class__.__name__ + ".", "")
120 changes: 120 additions & 0 deletions volatility3/framework/plugins/linux/ptrace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# This file is Copyright 2024 Volatility Foundation and licensed under the Volatility Software License 1.0
# which is available at https://www.volatilityfoundation.org/license/vsl-v1.0
#

import logging
from typing import List

from volatility3.framework import renderers, interfaces, constants, objects
from volatility3.framework.constants.linux import PT_FLAGS
from volatility3.framework.constants.architectures import LINUX_ARCHS
from volatility3.framework.objects import utility
from volatility3.framework.configuration import requirements
from volatility3.framework.interfaces import plugins
from volatility3.plugins.linux import pslist

vollog = logging.getLogger(__name__)


class Ptrace(plugins.PluginInterface):
"""Enumerates tracer and tracee tasks"""

_required_framework_version = (2, 10, 0)
_version = (1, 0, 0)

@classmethod
def get_requirements(cls) -> List[interfaces.configuration.RequirementInterface]:
return [
requirements.ModuleRequirement(
name="kernel",
description="Linux kernel",
architectures=LINUX_ARCHS,
),
requirements.PluginRequirement(
name="pslist", plugin=pslist.PsList, version=(2, 2, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that this is 2.2.1, rather than 2.2.0? I just ask because a) I don't think our version verification checks PATCH numbers, and b) any change in API should alter the MINOR version at minimum, PATCH numbers should be transparent to the outside world. It's fine as long as it doesn't matter, I just wanted to make sure that 2.2.0 would be ok...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, yeah I know the patch number is not verified... 2.2.0 works too. Do you want me to change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can if you want for clarity/correctness, but it's not required. Just trying to avoid confusion given the PATCH number won't get checked.

),
]

@classmethod
def enumerate_ptraced_tasks(
cls,
context: interfaces.context.ContextInterface,
symbol_table: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out this is a module_name, not a symbol_table, please name it appropriately (and ideally provide a docstring explaining what this function does).

):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please get typing information on the return value, since it seems to be a generator?

vmlinux = context.modules[symbol_table]

tsk_struct_symname = vmlinux.symbol_table_name + constants.BANG + "task_struct"

tasks = pslist.PsList.list_tasks(
context,
symbol_table,
filter_func=pslist.PsList.create_pid_filter(),
include_threads=True,
)

for task in tasks:
tracing_tid_list = [
int(task_being_traced.pid)
for task_being_traced in task.ptraced.to_list(
tsk_struct_symname, "ptrace_entry"
)
]

if task.ptrace == 0 and not tracing_tid_list:
continue

flags = (
PT_FLAGS(task.ptrace).flags
if task.ptrace != 0
else renderers.NotAvailableValue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'd prefer results being None, so that other plugins using them can easily differentiate without having to import renderers, and then just before it's output to the treegrid, converting it. Since these always seem to be NotAvailableValue it seems like this should be pretty easy to do?

)

traced_by_tid = (
task.parent.pid
if task.real_parent != task.parent
else renderers.NotAvailableValue()
)

tracing_tids = ",".join(map(str, tracing_tid_list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bad idea. It's turning data into text form that then won't be possible to consume or work with if it's consumed by another plugin or a program. I'd prefer this output separate lines with one entry per line please. This might make the output quite verbose, I'm happy for you to use the Tree part of the TreeGrid to help organize these, but the point of having simple types returned is so that the data can be reused, not so it can be smushed into a human readable format and become more difficult for anything else to consume.


yield task.comm, task.tgid, task.pid, traced_by_tid, tracing_tids, flags

def _generator(self, symbol_table):
for fields in self.enumerate_ptraced_tasks(self.context, symbol_table):
yield (0, fields)

@staticmethod
def format_fields_with_headers(headers, generator):
"""Uses the headers type to cast the fields obtained from the generator"""
for level, fields in generator:
formatted_fields = []
for header, field in zip(headers, fields):
header_type = header[1]

if isinstance(
field, (header_type, interfaces.renderers.BaseAbsentValue)
):
formatted_field = field
elif isinstance(field, objects.Array) and header_type is str:
formatted_field = utility.array_to_string(field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method is trying to circumvent the restrictions rather than working within them. Don't pass arrays through the field, pass individual numbers with repeated data instead. There really shouldn't be any need to reformat the response from the generator before feeding it back out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array is task.comm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry, I misunderstood. I thought this was for something that was a set of numbers like the tracing_tid_list. Why wouldn't we do the conversion to string during the output, rather than yielding an array? We already know which field is going to be returned as an array, it would lose volatility information (like its layer and offset) but it's likely to be far more usable? It's also strange that knowing the field isn't an instance of header_type, you apply it like a cast? What situations is that trying to catch?

else:
formatted_field = header_type(field)

formatted_fields.append(formatted_field)
yield level, formatted_fields

def run(self):
symbol_table = self.config["kernel"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config['kernel'] is the name of a module, not the name of a symbol table. Please just rename it to kernel or kernel_module or something similar.


headers = [
("Process", str),
("PID", int),
("TID", int),
("Traced by TID", int),
("Tracing TIDs", str),
("Flags", str),
]
return renderers.TreeGrid(
headers,
self.format_fields_with_headers(headers, self._generator(symbol_table)),
)
Loading