linux: module symbols: Fixes and code improvements #1509

gcmoreira · 2025-01-03T08:48:55Z

Fixed an issue causing the generation of an invalid, extra symbol.
Reuse ELF sym API instead of reimplemented it
Updated the function to return the symbol index, enabling the use of additional module tables.
Ensured the ELF symbol object has cached_strtab set, allowing retrieval of critical symbol information like names.
Added typing hints

- Fixed an issue causing the generation of an invalid, extra symbol. - Reuse ELF sym API instead of reimplemented it - Updated the function to return the symbol index, enabling the use of additional module tables. - Ensured the ELF symbol object has `cached_strtab` set, allowing retrieval of critical symbol information like names. - Added typing hints

gcmoreira · 2025-01-03T08:57:30Z

Hey @ikelos, this is part of a larger pull request that I'm breaking into smaller, more manageable changes. While these updates might seem disconnected at the moment, they will align once the full implementation is in place.

The module::get_symbols() method interface has been updated. Since it was only used internally, I've adjusted all relevant references to match the new implementation.

Given the scope of these changes, at least a framework minor version update might be required. Let me know your thoughts!

Abyss-W4tcher · 2025-01-03T12:14:29Z

volatility3/framework/symbols/linux/extensions/elf.py

+        addr = self._cached_strtab + self.st_name
+
+        layer = self._context.layers[self.vol.layer_name]
+        name_bytes = layer.read(addr, max_size, pad=True)


Do you think a try/except should be wrapped around the layer read to prevent from memory smear ?

With pad=True it shouldn't raise any exceptions

Ok, I had concerns about this early part, if self.st_name is an absurd value due to smear ?

volatility3/volatility3/framework/layers/physical.py

Lines 155 to 163 in cedc0b0

def read(self, offset: int, length: int, pad: bool = False) -> bytes:

"""Reads from the file at offset for length."""

if not self.is_valid(offset, length):

invalid_address = offset

if self.minimum_address < offset <= self.maximum_address:

invalid_address = self.maximum_address + 1

raise exceptions.InvalidAddressException(

self.name, invalid_address, "Offset outside of the buffer boundaries"

)

When pad=True, this for because ignore_errors=pad, will return immediately:

volatility3/volatility3/framework/layers/linear.py

Lines 45 to 47 in cedc0b0

for offset, _, mapped_offset, mapped_length, layer in self.mapping(

offset, length, ignore_errors=pad

):

Then this line will pad everything with zeros:

volatility3/volatility3/framework/layers/linear.py

Line 67 in cedc0b0

return recovered_data + b"\x00" * (length - len(recovered_data))

Thanks for clarifying it !

ikelos

A few questions around some of the API changes, some of which seem unnecessary and would require bumps, but otherwise looks ok...

ikelos · 2025-01-05T20:58:17Z

volatility3/framework/symbols/linux/extensions/__init__.py

        """Get symbols of the module

        Yields:
-                A symbol object
+            A tuple containing the ELF symbol index and the corresponding ELF symbol object


This makes assumptions about the ordering of the symbols returned (and it's also technically a change in the API since it'll break anything the currently doesn't expect a tuple) so if you want to make this change it'll need a MAJOR version bump somewhere, but I can't see the use in it? What's the reason for needing to return the index?

The only existing call to this that I saw get updated throws the value of the index away?

but I can't see the use in it?

yes, I tried to explain this here:

Hey @ikelos, this is part of a larger pull request that I'm breaking into smaller, more manageable changes. While these updates might seem disconnected at the moment, they will align once the full implementation is in place.

Yes, symbols in the symbol table are ordered, which is important because it is not the only table in a Linux kernel module. The module also contains a types table, where the indices correspond to those in the symbol table. This means that the type for the symbol at index (n) in the symbol table is located at the same index in the types table.

We could handle this in the caller using enumerate, but that would mean get_symbols() must always preserve the original order. It cannot skip any symbol, whether it's invalid, null, empty, or whatever. If skipped, the symbols could shift in relation to the other tables, breaking the framework.

I believe this approach ensures the symbols stay correctly aligned with their indices.
Should this be considered a major version bump for the whole framework? Unfortunately, object extensions don't have versioning in place.

I feel there's a better way of marrying the two up? The symbols in an ISF can have a type parameter, telling you which type they use? Is that not coming out of the dwarf2json? Enumerating them here seems really fragile and we'll have to remember it if we ever tinker with any of the way the symbol tables work. See here.

Recalculating it each time just opens up the chance something will go wrong later, the mapping is static data, it should be encoded in the ISF itself, surely?

Hm, it's not related to those symbols. These are kernel module symbols, and this information is not part of the ISF. It is gathered directly from memory rather than from d2j. The symbol types are not included in the elf_sym structure and never will be, as this data is embedded within the module, not within the symbol itself

@ikelos If we want to avoid a version major bump, I can make an intermediate function like:
get_symbols_ex() .. which yield index and symbol .. and get_symbols() calls it but ignoring the index. Then, who needs both can still call directly to get_symbols_ex().

Ok, I think I missed where they came from then (there's a lot of special things in the linux extensions that I'm not up to speed on). If you're happy it makes sense, then that's ok. Yes, please put it in a different function and make the old one a shell that just returns the same type and has the same signature and name as the original. Probably wise to include a deprecation warning in that call too, so anyone that is using it knows to shift themselves over...

Couldn't someone just call enumerate on the generator before first use though? If you're changing the function already I guess it makes it less brittle to ensure it's always count from the start, just trying to keep API changes to a minimum. As I say, if really want it I'm ok with it, just want to make sure it doesn't need changing again down the line... 5:)

volatility3/framework/symbols/linux/extensions/__init__.py

ikelos · 2025-01-05T21:19:10Z

volatility3/framework/symbols/linux/extensions/elf.py

@@ -328,22 +329,29 @@ def cached_strtab(self):
    def cached_strtab(self, cached_strtab):
        self._cached_strtab = cached_strtab

-    def get_name(self):
-        addr = self._cached_strtab + self.st_name
+    def get_name(self, max_size=KSYM_NAME_LEN) -> Optional[str]:


Technically this is an additive change to the API. I'm not sure what the first versioned component above this would be though? Can we avoid this being a parameter, but make it a constant instead? I suspect the maximum doesn't change often, and it would keep the API the same but pull the value out to make it more destinctly something people can see and set when needed?

Yes, we can remove it from the parameters if you prefer. Since the ELF API could be used for other purposes than just kernel symbols, and for some of those use cases, a shorter maximum symbol name length could make more sense there, I assumed that a general maximum-maximum length would be fine for all cases but allowing to customize a shorter one if for some reason it's needed. I also made it a parameter with default value to minimize the impact of this change.

The Linux kernel has it's maximum length, but for general/user-space ELF files the maximum size can be different. The kernel used to have a maximum length of 128 bytes until Rust happened. In kernels 6.1, it was increased to 512 bytes in this commit.

Hmmmm, ok. I think still best to make it a constant, and then we can make a new class that overrides that class local constant down the line (and the class local constant can be set from constants if you want)...

So, do you mean something like this (internal or not)?

class elf_sym(objects.StructType): _MAX_NAME_LENGTH = KSYM_NAME_LEN

constants.KSYM_NAME_LEN, but yeah, so there's constants (user overridable in one clear place, class constant so that inheritors could override it, and then the actual use point). Does that make sense or is it just over complicating things? I don't mind so much, so long as it's pulled out somewhere sensible... 5:)

Abyss-W4tcher reviewed Jan 3, 2025

View reviewed changes

ikelos reviewed Jan 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linux: module symbols: Fixes and code improvements #1509

linux: module symbols: Fixes and code improvements #1509

gcmoreira commented Jan 3, 2025

gcmoreira commented Jan 3, 2025

Abyss-W4tcher Jan 3, 2025

gcmoreira Jan 3, 2025

Abyss-W4tcher Jan 3, 2025 •

edited

Loading

gcmoreira Jan 3, 2025

Abyss-W4tcher Jan 3, 2025

ikelos left a comment

ikelos Jan 5, 2025

gcmoreira Jan 6, 2025 •

edited

Loading

ikelos Jan 7, 2025

gcmoreira Jan 7, 2025 •

edited

Loading

gcmoreira Jan 8, 2025

ikelos Jan 9, 2025

ikelos Jan 9, 2025

ikelos Jan 5, 2025

gcmoreira Jan 6, 2025 •

edited

Loading

ikelos Jan 7, 2025

gcmoreira Jan 8, 2025

ikelos Jan 9, 2025 •

edited

Loading

	def read(self, offset: int, length: int, pad: bool = False) -> bytes:
	"""Reads from the file at offset for length."""
	if not self.is_valid(offset, length):
	invalid_address = offset
	if self.minimum_address < offset <= self.maximum_address:
	invalid_address = self.maximum_address + 1
	raise exceptions.InvalidAddressException(
	self.name, invalid_address, "Offset outside of the buffer boundaries"
	)

	for offset, _, mapped_offset, mapped_length, layer in self.mapping(
	offset, length, ignore_errors=pad
	):

linux: module symbols: Fixes and code improvements #1509

Are you sure you want to change the base?

linux: module symbols: Fixes and code improvements #1509

Conversation

gcmoreira commented Jan 3, 2025

gcmoreira commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abyss-W4tcher Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ikelos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcmoreira Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcmoreira Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcmoreira Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ikelos Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Abyss-W4tcher Jan 3, 2025 •

edited

Loading

gcmoreira Jan 6, 2025 •

edited

Loading

gcmoreira Jan 7, 2025 •

edited

Loading

gcmoreira Jan 6, 2025 •

edited

Loading

ikelos Jan 9, 2025 •

edited

Loading