Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: Add support for threads in both lsof and sockstat plugins. #1263

231 changes: 137 additions & 94 deletions volatility3/framework/plugins/linux/lsof.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# This file is Copyright 2024 Volatility Foundation and licensed under the Volatility Software License 1.0
# which is available at https://www.volatilityfoundation.org/license/vsl-v1.0
#
"""A module containing a collection of plugins that produce data typically
found in Linux's /proc file system."""
import logging, datetime
from typing import List, Callable
import logging
from datetime import datetime
from dataclasses import dataclass, astuple, field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these are built-in, but please can we stick to the style guide we use, which says all imports should be modules rather than methods, classes or objects please (typing gets a free pass)? This prevents people importing lsof.astuple and thinking that's where it's defined. See https://google.github.io/styleguide/pyguide.html#22-import

This also applies to datetime which was previously correctly imported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm sorry but that's the Google style guide.. is it mentioned anywhere that we stick to Google's? why?

We should stick to the Python's styles guide which is the PEP8 ... PEP 8 – Style Guide for Python Code , for which that's perfectly fine.. actually there is an example that shows that's correct:

# Correct:
from subprocess import Popen, PIPE

Also for classes:

When importing a class from a class-containing module, it’s usually okay to spell this:

from myclass import MyClass
from foo.bar.yourclass import YourClass

If this spelling causes local name clashes, then spell them explicitly:

import myclass
import foo.bar.yourclass

and use myclass.MyClass and foo.bar.yourclass.YourClass.

Copy link
Member

@ikelos ikelos Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually from the Volatility style guide from 2014 which is mostly in alignment with PEP8 with some extra requests. I thought that was a bit terse and difficult to read, so I pointed you at Google's style guide which is longer with more discussion, but here's the original text:

  • Importing specific objects or functions is discouraged.
    ** It pollutes the namespace and causes confusion. Previous versions of volatility allowed people to import functions from files they weren't defined in.
  • "from blah import *" is strongly discouraged.
    ** Again, namespace pollution and inappropriate imports.

The entire rest of the code base is formatted that way and it's better to have a style guide than not. I honestly didn't expect this request to be a big sticking point? Can you explain why allowing namespace pollution is a better stylistic option than our current one, or is your disagreement just that the documentation was provided by Google and that code comes out slightly longer?

The style guide needs updating (it still includes formatting that's handled by black now, rather than yapf at the time) and it could do with reformatting since some justifications are at the same level as the bullet points they're justifying.

Copy link
Contributor Author

@gcmoreira gcmoreira Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I will take care of this.

I'm not saying it's better, but it's unclear to me why this is a concern now. I've never heard it mentioned in any of my previous contributions or by others.

We'll also need to fix this, if we want to make an exception with Typing, in about 50 other imports. Otherwise, there are 244 imports to address.

Also, Volatility2 is a separate project, so we can't expect people to refer to it for guidance on writing code for Volatility3. Volatility3 should have its own dedicated style guide page, or clear instructions of which style guide we use.

I wonder if there's a way to include that as a rule in Black. Any idea? By the way, Black follows Python's PEP8 style guide.

I also learned we have a .style.yapf file, which configures the Yapf Python formatted from Google.

This .style.yapf file doesn't specify any based_on_style, which means it's defaulting to PEP8 as the style guide. However, it seems we can switch it to Google's style guide if preferred. See formatting-style

based_on_style = google

Maybe, if we are going to use Google's style guide we should use Yapf instead of Black.
Otherwise, it might be a good idea to remove that file, since it creates confusion about which formatter the project actually uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can go back through previous pull requests. Where I've spotted it I should have always asked for it to be corrected (and provided the rationale). Most people stick with it, I'm disappointed I missed it for 244 imports! 5:( It's possible I had quibbled a lot about the PRs and was feeling sorry for them having to fix so many things, but I feel it's saved us for lots of cross-imports between plugins and the core so I think it's a worthwhile guideline to have.

Yep, fair point, the style guide never got pulled across to vol 3 I don't think and I didn't want to sit down and rewrite it again (I didn't even tidy up the one we had), but we should.

I don't think black has that as a flag (I don't think black has many flags, to be fair). If you find one, it would be great to apply it to our automatic black check, so that it's applied to all PRs evenly when people submit them!

Yeah, that's the style mechanism we used very early on, it was deprecated years ago but I guess I didn't remove the file. I've taken care of that now. Yapf was very bad, it would reweight something, and then because of the reweighting running it again would change it back, meaning it never reached a stable equilibrium. It's the reason we moved to black.

We don't use the Google style guide exclusively, but there are a couple of points in it that make sense, such as only importing modules (in fact, that's the main one I can think of), which we've carried over. As mentioned the main style guide is in general PEP 8.

from typing import List, Callable, Tuple

from volatility3.framework import renderers, interfaces, constants, exceptions
from volatility3.framework import renderers, interfaces, constants
from volatility3.framework.configuration import requirements
from volatility3.framework.interfaces import plugins
from volatility3.framework.objects import utility
Expand All @@ -17,11 +17,94 @@
vollog = logging.getLogger(__name__)


@dataclass
class FDUser:
"""FD user representation, featuring augmented information and formatted fields.
This is the data the plugin will eventually display.
"""

task_tgid: int
task_tid: int
task_comm: str
fd_num: int
full_path: str
device: str = field(default=renderers.NotAvailableValue())
inode_num: int = field(default=renderers.NotAvailableValue())
inode_type: str = field(default=renderers.NotAvailableValue())
file_mode: str = field(default=renderers.NotAvailableValue())
change_time: datetime = field(default=renderers.NotAvailableValue())
modification_time: datetime = field(default=renderers.NotAvailableValue())
access_time: datetime = field(default=renderers.NotAvailableValue())
inode_size: int = field(default=renderers.NotAvailableValue())


@dataclass
class FDInternal:
"""FD internal representation containing only the core objects

Fields:
task: 'task_truct' object
fd_fields: FD fields as obtained from LinuxUtilities.files_descriptors_for_process()
"""

task: interfaces.objects.ObjectInterface
fd_fields: Tuple[int, int, str]

def to_user(self) -> FDUser:
"""Augment the FD information to be presented to the user

Returns:
An InodeUser dataclass
"""
# Ensure all types are atomic immutable. Otherwise, astuple() will take a long
# time doing a deepcopy of the Volatility objects.
task_tgid = int(self.task.tgid)
task_tid = int(self.task.pid)
task_comm = utility.array_to_string(self.task.comm)
fd_num, filp, full_path = self.fd_fields
fd_num = int(fd_num)
full_path = str(full_path)
inode = filp.get_inode()
if inode:
superblock_ptr = inode.i_sb
if superblock_ptr and superblock_ptr.is_readable():
device = f"{superblock_ptr.major}:{superblock_ptr.minor}"
else:
device = renderers.NotAvailableValue()

fd_user = FDUser(
task_tgid=task_tgid,
task_tid=task_tid,
task_comm=task_comm,
fd_num=fd_num,
full_path=full_path,
device=device,
inode_num=int(inode.i_ino),
inode_type=inode.get_inode_type() or renderers.UnparsableValue(),
file_mode=inode.get_file_mode(),
change_time=inode.get_change_time(),
modification_time=inode.get_modification_time(),
access_time=inode.get_access_time(),
inode_size=int(inode.i_size),
)
else:
# We use the dataclasses' default values
fd_user = FDUser(
task_tgid=task_tgid,
task_tid=task_tid,
task_comm=task_comm,
fd_num=fd_num,
full_path=full_path,
)

return fd_user


class Lsof(plugins.PluginInterface, timeliner.TimeLinerInterface):
"""Lists open files for each processes."""

_required_framework_version = (2, 0, 0)
_version = (1, 2, 0)
_version = (2, 0, 0)

@classmethod
def get_requirements(cls) -> List[interfaces.configuration.RequirementInterface]:
Expand All @@ -45,126 +128,86 @@ def get_requirements(cls) -> List[interfaces.configuration.RequirementInterface]
),
]

@classmethod
def get_inode_metadata(cls, filp: interfaces.objects.ObjectInterface):
try:
dentry = filp.get_dentry()
if dentry:
inode_object = dentry.d_inode
if inode_object and inode_object.is_valid():
itype = (
inode_object.get_inode_type() or renderers.NotAvailableValue()
)
return (
inode_object.i_ino,
itype,
inode_object.i_size,
inode_object.get_file_mode(),
inode_object.get_change_time(),
inode_object.get_modification_time(),
inode_object.get_access_time(),
)
except (exceptions.InvalidAddressException, AttributeError) as e:
vollog.warning(f"Can't get inode metadata: {e}")
return None

@classmethod
def list_fds(
cls,
context: interfaces.context.ContextInterface,
symbol_table: str,
vmlinux_module_name: str,
filter_func: Callable[[int], bool] = lambda _: False,
):
) -> FDInternal:
"""Enumerates open file descriptors in tasks

Args:
context: The context to retrieve required elements (layers, symbol tables) from
vmlinux_module_name: The name of the kernel module on which to operate
filter_func: A function which takes a process object and returns True if the process
should be ignored/filtered

Yields:
A FDInternal object
"""
linuxutils_symbol_table = None
for task in pslist.PsList.list_tasks(context, symbol_table, filter_func):
for task in pslist.PsList.list_tasks(
context, vmlinux_module_name, filter_func, include_threads=True
):
if linuxutils_symbol_table is None:
if constants.BANG not in task.vol.type_name:
raise ValueError("Task is not part of a symbol table")
linuxutils_symbol_table = task.vol.type_name.split(constants.BANG)[0]

task_comm = utility.array_to_string(task.comm)
pid = int(task.pid)

fd_generator = linux.LinuxUtilities.files_descriptors_for_process(
context, linuxutils_symbol_table, task
)

for fd_fields in fd_generator:
yield pid, task_comm, task, fd_fields
yield FDInternal(task=task, fd_fields=fd_fields)

@classmethod
def list_fds_and_inodes(
cls,
context: interfaces.context.ContextInterface,
symbol_table: str,
filter_func: Callable[[int], bool] = lambda _: False,
):
for pid, task_comm, task, (fd_num, filp, full_path) in cls.list_fds(
context, symbol_table, filter_func
):
inode_metadata = cls.get_inode_metadata(filp)
if inode_metadata is None:
inode_metadata = tuple(
interfaces.renderers.BaseAbsentValue() for _ in range(7)
)
yield pid, task_comm, task, fd_num, filp, full_path, inode_metadata

def _generator(self, pids, symbol_table):
def _generator(self, pids, vmlinux_module_name):
filter_func = pslist.PsList.create_pid_filter(pids)
fds_generator = self.list_fds_and_inodes(
self.context, symbol_table, filter_func=filter_func
)

for (
pid,
task_comm,
task,
fd_num,
filp,
full_path,
inode_metadata,
) in fds_generator:
inode_num, itype, file_size, imode, ctime, mtime, atime = inode_metadata
fields = (
pid,
task_comm,
fd_num,
full_path,
inode_num,
itype,
imode,
ctime,
mtime,
atime,
file_size,
)
yield (0, fields)
for fd_internal in self.list_fds(
self.context, vmlinux_module_name, filter_func=filter_func
):
fd_user = fd_internal.to_user()
yield (0, astuple(fd_user))

def run(self):
pids = self.config.get("pid", None)
symbol_table = self.config["kernel"]
vmlinux_module_name = self.config["kernel"]

tree_grid_args = [
("PID", int),
("TID", int),
("Process", str),
("FD", int),
("Path", str),
("Device", str),
("Inode", int),
("Type", str),
("Mode", str),
("Changed", datetime.datetime),
("Modified", datetime.datetime),
("Accessed", datetime.datetime),
("Changed", datetime),
("Modified", datetime),
("Accessed", datetime),
("Size", int),
]
return renderers.TreeGrid(tree_grid_args, self._generator(pids, symbol_table))
return renderers.TreeGrid(
tree_grid_args, self._generator(pids, vmlinux_module_name)
)

def generate_timeline(self):
pids = self.config.get("pid", None)
symbol_table = self.config["kernel"]
for row in self._generator(pids, symbol_table):
_depth, row_data = row
description = f'Process {row_data[1]} ({row_data[0]}) Open "{row_data[3]}"'
yield description, timeliner.TimeLinerType.CHANGED, row_data[7]
yield description, timeliner.TimeLinerType.MODIFIED, row_data[8]
yield description, timeliner.TimeLinerType.ACCESSED, row_data[9]
vmlinux_module_name = self.config["kernel"]

filter_func = pslist.PsList.create_pid_filter(pids)
for fd_internal in self.list_fds(
self.context, vmlinux_module_name, filter_func=filter_func
):
fd_user = fd_internal.to_user()

description = (
f"Process {fd_user.task_comm} ({fd_user.task_tgid}/{fd_user.task_tid}) "
f"Open '{fd_user.full_path}'"
)

yield description, timeliner.TimeLinerType.CHANGED, fd_user.change_time
yield description, timeliner.TimeLinerType.MODIFIED, fd_user.modification_time
yield description, timeliner.TimeLinerType.ACCESSED, fd_user.access_time
18 changes: 11 additions & 7 deletions volatility3/framework/plugins/linux/sockstat.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ class SockHandlers(interfaces.configuration.VersionableInterface):

_required_framework_version = (2, 0, 0)

_version = (1, 0, 0)
_version = (1, 0, 1)

def __init__(self, vmlinux, task):
def __init__(self, vmlinux, task, *args, **kwargs):
super().__init__(*args, **kwargs)
self._vmlinux = vmlinux
self._task = task

Expand Down Expand Up @@ -438,7 +439,7 @@ class Sockstat(plugins.PluginInterface):

_required_framework_version = (2, 0, 0)

_version = (1, 0, 0)
_version = (2, 0, 0)

@classmethod
def get_requirements(cls):
Expand All @@ -452,7 +453,7 @@ def get_requirements(cls):
name="SockHandlers", component=SockHandlers, version=(1, 0, 0)
),
requirements.PluginRequirement(
name="lsof", plugin=lsof.Lsof, version=(1, 1, 0)
name="lsof", plugin=lsof.Lsof, version=(2, 0, 0)
),
requirements.VersionRequirement(
name="linuxutils", component=linux.LinuxUtilities, version=(2, 0, 0)
Expand Down Expand Up @@ -507,8 +508,9 @@ def list_sockets(
dfop_addr = vmlinux.object_from_symbol("sockfs_dentry_operations").vol.offset

fd_generator = lsof.Lsof.list_fds(context, vmlinux.name, filter_func)
for _pid, _task_comm, task, fd_fields in fd_generator:
fd_num, filp, _full_path = fd_fields
for fd_internal in fd_generator:
fd_num, filp, _full_path = fd_internal.fd_fields
task = fd_internal.task

if filp.f_op not in (sfop_addr, dfop_addr):
continue
Expand Down Expand Up @@ -617,6 +619,7 @@ def _generator(self, pids: List[int], netns_id_arg: int, symbol_table: str):

fields = (
netns_id,
task.tgid,
task.pid,
fd_num,
format_hints.Hex(sock.vol.offset),
Expand All @@ -636,7 +639,8 @@ def run(self):

tree_grid_args = [
("NetNS", int),
("Pid", int),
("PID", int),
("TID", int),
("FD", int),
("Sock Offset", format_hints.Hex),
("Family", str),
Expand Down
4 changes: 2 additions & 2 deletions volatility3/framework/symbols/linux/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ def files_descriptors_for_process(
task: interfaces.objects.ObjectInterface,
):
# task.files can be null
if not task.files:
if not (task.files and task.files.is_readable()):
return None

fd_table = task.files.get_fds()
Expand All @@ -276,7 +276,7 @@ def files_descriptors_for_process(
)

for fd_num, filp in enumerate(fds):
if filp != 0:
if filp and filp.is_readable():
full_path = LinuxUtilities.path_for_file(context, task, filp)

yield fd_num, filp, full_path
Expand Down
Loading
Loading