Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing exception handling in paths related to cached files and i… #1516

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

atcuno
Copy link
Contributor

@atcuno atcuno commented Jan 3, 2025

…nodes

The missing exception handling in these paths causes many backtraces across plugins and samples. Add in the missing handlers.

@atcuno
Copy link
Contributor Author

atcuno commented Jan 3, 2025

@gcmoreira check the spots here where I asked you a question as I wasn't 100% sure in those spots the best action/return value.

@atcuno
Copy link
Contributor Author

atcuno commented Jan 3, 2025

@gcmoreira check the spots here where I asked you a question as I wasn't 100% sure in those spots the best action/return value.

This also directly relates to my comment here:

#1503 (comment)

Where is_readable isn't enough as structure members can cross page boundaries and/or pointer members of a structure can point to invalid pages. I don't see much use for is_readable unless its to check a function pointer. Access to structure members needs to be in try/except.

@gcmoreira
Copy link
Contributor

Where is_readable isn't enough as structure members can cross page boundaries and/or pointer members of a structure can point to invalid pages. I don't see much use for is_readable unless its to check a function pointer. Access to structure members needs to be in try/except.

hm it should work. Could you point me to a sample where using is_readable() and/or layer.is_valid() don't work or is not enough? I would like to debug that

@atcuno
Copy link
Contributor Author

atcuno commented Jan 4, 2025

Where is_readable isn't enough as structure members can cross page boundaries and/or pointer members of a structure can point to invalid pages. I don't see much use for is_readable unless its to check a function pointer. Access to structure members needs to be in try/except.

hm it should work. Could you point me to a sample where using is_readable() and/or layer.is_valid() don't work or is not enough? I would like to debug that

Sure, check this backtrace, which the fix is going to be another PR tomorrow (using the fixed get_inode()):

25-01-03 18:27:33 volatility3.cli DEBUG    Traceback (most recent call last):
  File "/home/ub/volatility3/volatility3/cli/__init__.py", line 501, in run
    renderer.render(grid)
  File "/home/ub/volatility3/volatility3/cli/text_renderer.py", line 232, in render
    grid.populate(visitor, outfd)
  File "/home/ub/volatility3/volatility3/framework/renderers/__init__.py", line 240, in populate
    for level, item in self._generator:
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 350, in format_fields_with_headers
    for level, fields in generator:
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 312, in _generator
    for inode_in in inodes_iter:
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 272, in get_inodes
    for file_path, file_dentry in cls._walk_dentry(
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 208, in _walk_dentry
    yield from cls._walk_dentry(seen_dentries, dentry, parent_dir=file_path)
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 208, in _walk_dentry
    yield from cls._walk_dentry(seen_dentries, dentry, parent_dir=file_path)
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 208, in _walk_dentry
    yield from cls._walk_dentry(seen_dentries, dentry, parent_dir=file_path)
  [Previous line repeated 3 more times]
  File "/home/ub/volatility3/volatility3/framework/plugins/linux/pagecache.py", line 189, in _walk_dentry
    inode_ptr = dentry.d_inode
                ^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/objects/__init__.py", line 961, in __getattr__
    member = template(context=self._context, object_info=object_info)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/objects/templates.py", line 96, in __call__
    return self.vol.object_class(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/objects/__init__.py", line 168, in __new__
    value = cls._unmarshall(context, data_format, object_info)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/objects/__init__.py", line 408, in _unmarshall
    data = context.layers.read(object_info.layer_name, object_info.offset, length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/interfaces/layers.py", line 635, in read
    return self[layer].read(offset, length, pad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/layers/linear.py", line 45, in read
    for offset, _, mapped_offset, mapped_length, layer in self.mapping(
  File "/home/ub/volatility3/volatility3/framework/layers/intel.py", line 302, in mapping
    for offset, size, mapped_offset, mapped_size, map_layer in self._mapping(
  File "/home/ub/volatility3/volatility3/framework/layers/intel.py", line 358, in _mapping
    chunk_offset, page_size, layer_name = self._translate(offset)
                                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/layers/intel.py", line 162, in _translate
    entry, position = self._translate_entry(offset)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ub/volatility3/volatility3/framework/layers/intel.py", line 210, in _translate_entry
    raise exceptions.PagedInvalidAddressException(
volatility3.framework.exceptions.PagedInvalidAddressException: Page Fault at entry 0x0 in table page directory pointer

This is running linux.pagecache.Files on the auditd_collection.zip in the regression set. As you can see the code currently (https://github.com/volatilityfoundation/volatility3/blob/develop/volatility3/framework/plugins/linux/pagecache.py#L189) does:

inode_ptr = dentry.d_inode
if not (inode_ptr and inode_ptr.is_readable()):
    continue

and the backtrace triggers on the first line above's access to d_inode and the if statement under is never reached. This is what I have been trying to explain in that just accessing either invalid object (dentry in this case) or a pointer member (d_inode) will immediately cause the invalid address backtrace.

The pattern of accessing a pointer member and then trying to check its value is the Volatility 2 way of doing things. In Volatility 3, every single access like this needs to be in a try/except. David and I have been making an effort to localize the checks (per access or similar, with debug prints where it makes sense), but there is no way to avoid the try/except (or contextlib.suppress when its better) in Volatility 3.

I want to make sure you understand all this as I will need alot of help over the next week or two cleaning up all the is_readable places in the Linux API as almost every plugin is breaking across dozens of samples in the regression set.

@gcmoreira
Copy link
Contributor

gcmoreira commented Jan 4, 2025

Ok, I debugged what happened with the auditd_collection.zip sample mentioned above.

image

In this case, the problem is that the dentry is invalid, not the inode. So, it's clear that the line pointed above won't catch this type of issues.

    if not (inode_ptr and inode_ptr.is_readable()):

To correctly fix this, instead of making a try/except block every time a dentry member is referenced like this PR suggests, a better approach is to make sure the generated dentries are correct. In this case, the problem is not where we were about to use the try/except but in get_subdirs(). I think that with just this first case, it clearly shows how this approach could have led us to bury a bug rather than addressing it.

For instance, with the following fix, auditd_collection.zip finishes without any issues and without using any try/except.

--- a/volatility3/framework/symbols/linux/extensions/__init__.py
+++ b/volatility3/framework/symbols/linux/extensions/__init__.py
@@ -1133,7 +1133,11 @@ class dentry(objects.StructType):
             raise exceptions.VolatilityException("Unsupported dentry type")
 
         dentry_type_name = self.get_symbol_table_name() + constants.BANG + "dentry"
-        yield from list_head_member.to_list(dentry_type_name, walk_member)
+
+        layer = self._context.layers[self.vol.layer_name]
+        for dentry in list_head_member.to_list(dentry_type_name, walk_member):
+            if layer.is_valid(dentry.vol.offset):
+                yield dentry
 
     def get_inode(self) -> interfaces.objects.ObjectInterface:
         """Returns the inode associated with this dentry"""

We could also use the following line to check the whole dentry object, but it seems unnecessary and might introduce some overhead. The patch above should handle the preliminary check, and the dentry user can then validate the individual members/pointers where needed.

            if layer.is_valid(dentry.vol.offset, dentry.vol.size):

Alternatively, we can do a similar check even deeper, in the to_list() methods.

Having said that, I do understand that the try/except workarounds might be acceptable if, for some reason, we're pressed for time, but eventually, we need to revisit and address the underlying issues causing each of these problems.

Also, I'm happy to help with debugging and identifying the root causes of the remaining issues if needed. Just point me to the sample, its ISF and how to reproduce the bug.

@atcuno
Copy link
Contributor Author

atcuno commented Jan 4, 2025

It is fine to add the is_readable check inside the API as that covers the first page. I have done similar in a few PRs recently to avoid completely broken processes and VMAs from being yielded.

Adding the layer.is_valid(dentry.vol.offset, dentry.vol.size): would violate Vol3's way of doing things as it would not allow analysis of partial structures.

Adding the check in the layer though doesn't fix the overall problem though in that:

  1. Any access to d_inode needs to go through get_inode
  2. Get inode needs to properly access the its pointer to avoid backtraces
  3. Any subsequent usage of the inode members needs to be in a try/except for members that go onto the second page

In summary, we can make the APIs return less smeared instances but it won't remove the need for plugins to check that members are in memory before using them.

We also need to move frequently accessed pointer members to accessor methods like get_inode, get_superblock, get_parent (for dentrys), etc.

I will be out of town this week with only a few samples with me, so today I will make a bunch of tickets here and you can check the best places to do this while I am gone. I can then re-run the tests when back.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let Gus answer the questions you've posed, but from my perspective it seems to wrap each individual line in a try/except which isn't necessary. Chunk them together by the biggest block that might fail (in the same way) and then do a single except around that. So as big as you can get, and you can do a try/except inside another try except if you need something specific). So in general, the whole body inside a loop, because there is a small execution time cost of setting up the exception handler and the end result will be identical. Exceptions aren't free, only close to free... 5:)

reversed_path.append(current_dentry.d_name.name_as_str())
try:
parent = current_dentry.d_parent
except exceptions.InvalidAddressException:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements end in the same result, they should probably be in a single try/except block? The traceback would show which line went on, so this just seems excessive?

@@ -1102,10 +1116,24 @@ def d_ancestor(self, ancestor_dentry):
not current_dentry.is_root()
and current_dentry.vol.offset not in dentry_seen
):
if current_dentry.d_parent == ancestor_dentry.vol.offset:
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, feels like a single try/except could wrap all the statements here because the middle statements don't matter if it's still going to return None on a failure?

@gcmoreira
Copy link
Contributor

gcmoreira commented Jan 5, 2025

Any subsequent usage of the inode members needs to be in a try/except for members that go onto the second page

The OS will do its best to ensure allocations remain within the same page. Otherwise, it will cause TLB or cache misses, leading to page faults and slowing down the system. While it is possible for allocations to span multiple pages, fixing this issue in the proposed way doesn't scale and IMO is not good programming practice. If we follow your logic, we would need to wrap every single member access in the framework with try/except, assuming it might be on the next page, which to me doesn't make sense.

Again, everything points to the issue being deeper in the framework. It's better to resolve it at the root once and for all, rather than applying workarounds everywhere.

I think we should try my suggested fix in get_subdirs() and test if it works in all other cases. So far, it solves the issue with the provided sample auditd_collection.zip without needing the extensive changes from this PR. Are there other cases where this fix doesn't work or is not enough, and would need any of the changes you suggested?

@atcuno
Copy link
Contributor Author

atcuno commented Jan 5, 2025

As I said, I am fine with fixing things in the deeper layers to bubble up less junk but we really do need try/except over accesses that can span a page and/or when trying to follow pointers. You can rework this PR or just start from scratch as long as the failing cases are fixed. I am not attached to this particular PR at all, and I made the other tickets separate and assigned to you since you wrote so much of the cached file / inode code originally.

@ikelos
Copy link
Member

ikelos commented Jan 7, 2025

Soooo... am I merging this one (in which case the exceptions need combining/reducing) or @gcmoreira are you gonna make a new PR? I'm fine either way, just don't want to commit something we're going to have to change as soon as it's gone in...

@atcuno
Copy link
Contributor Author

atcuno commented Jan 7, 2025

@ikelos just ignore this one until Gus sorts out the issues. He is very focused on this one plus the other cached file ones this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants