-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux: module symbols: Fixes and code improvements #1509
base: develop
Are you sure you want to change the base?
linux: module symbols: Fixes and code improvements #1509
Conversation
- Fixed an issue causing the generation of an invalid, extra symbol. - Reuse ELF sym API instead of reimplemented it - Updated the function to return the symbol index, enabling the use of additional module tables. - Ensured the ELF symbol object has `cached_strtab` set, allowing retrieval of critical symbol information like names. - Added typing hints
Hey @ikelos, this is part of a larger pull request that I'm breaking into smaller, more manageable changes. While these updates might seem disconnected at the moment, they will align once the full implementation is in place. The Given the scope of these changes, at least a framework minor version update might be required. Let me know your thoughts! |
addr = self._cached_strtab + self.st_name | ||
|
||
layer = self._context.layers[self.vol.layer_name] | ||
name_bytes = layer.read(addr, max_size, pad=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a try/except should be wrapped around the layer read to prevent from memory smear ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With pad=True it shouldn't raise any exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I had concerns about this early part, if self.st_name
is an absurd value due to smear ?
volatility3/volatility3/framework/layers/physical.py
Lines 155 to 163 in cedc0b0
def read(self, offset: int, length: int, pad: bool = False) -> bytes: | |
"""Reads from the file at offset for length.""" | |
if not self.is_valid(offset, length): | |
invalid_address = offset | |
if self.minimum_address < offset <= self.maximum_address: | |
invalid_address = self.maximum_address + 1 | |
raise exceptions.InvalidAddressException( | |
self.name, invalid_address, "Offset outside of the buffer boundaries" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pad=True, this for
because ignore_errors=pad
, will return immediately:
volatility3/volatility3/framework/layers/linear.py
Lines 45 to 47 in cedc0b0
for offset, _, mapped_offset, mapped_length, layer in self.mapping( | |
offset, length, ignore_errors=pad | |
): |
Then this line will pad everything with zeros:
return recovered_data + b"\x00" * (length - len(recovered_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions around some of the API changes, some of which seem unnecessary and would require bumps, but otherwise looks ok...
"""Get symbols of the module | ||
|
||
Yields: | ||
A symbol object | ||
A tuple containing the ELF symbol index and the corresponding ELF symbol object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes assumptions about the ordering of the symbols returned (and it's also technically a change in the API since it'll break anything the currently doesn't expect a tuple) so if you want to make this change it'll need a MAJOR version bump somewhere, but I can't see the use in it? What's the reason for needing to return the index?
The only existing call to this that I saw get updated throws the value of the index away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I can't see the use in it?
yes, I tried to explain this here:
Hey @ikelos, this is part of a larger pull request that I'm breaking into smaller, more manageable changes. While these updates might seem disconnected at the moment, they will align once the full implementation is in place.
Yes, symbols in the symbol table are ordered, which is important because it is not the only table in a Linux kernel module. The module also contains a types table, where the indices correspond to those in the symbol table. This means that the type for the symbol at index (n) in the symbol table is located at the same index in the types table.
We could handle this in the caller using enumerate, but that would mean get_symbols()
must always preserve the original order. It cannot skip any symbol, whether it's invalid, null, empty, or whatever. If skipped, the symbols could shift in relation to the other tables, breaking the framework.
I believe this approach ensures the symbols stay correctly aligned with their indices.
Should this be considered a major version bump for the whole framework? Unfortunately, object extensions don't have versioning in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there's a better way of marrying the two up? The symbols in an ISF can have a type parameter, telling you which type they use? Is that not coming out of the dwarf2json? Enumerating them here seems really fragile and we'll have to remember it if we ever tinker with any of the way the symbol tables work. See here.
Recalculating it each time just opens up the chance something will go wrong later, the mapping is static data, it should be encoded in the ISF itself, surely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it's not related to those symbols. These are kernel module symbols, and this information is not part of the ISF. It is gathered directly from memory rather than from d2j. The symbol types are not included in the elf_sym structure and never will be, as this data is embedded within the module, not within the symbol itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ikelos If we want to avoid a version major bump, I can make an intermediate function like:
get_symbols_ex()
.. which yield index and symbol .. and get_symbols() calls it but ignoring the index. Then, who needs both can still call directly to get_symbols_ex()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I missed where they came from then (there's a lot of special things in the linux extensions that I'm not up to speed on). If you're happy it makes sense, then that's ok. Yes, please put it in a different function and make the old one a shell that just returns the same type and has the same signature and name as the original. Probably wise to include a deprecation warning in that call too, so anyone that is using it knows to shift themselves over...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't someone just call enumerate on the generator before first use though? If you're changing the function already I guess it makes it less brittle to ensure it's always count from the start, just trying to keep API changes to a minimum. As I say, if really want it I'm ok with it, just want to make sure it doesn't need changing again down the line... 5:)
@@ -328,22 +329,29 @@ def cached_strtab(self): | |||
def cached_strtab(self, cached_strtab): | |||
self._cached_strtab = cached_strtab | |||
|
|||
def get_name(self): | |||
addr = self._cached_strtab + self.st_name | |||
def get_name(self, max_size=KSYM_NAME_LEN) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is an additive change to the API. I'm not sure what the first versioned component above this would be though? Can we avoid this being a parameter, but make it a constant instead? I suspect the maximum doesn't change often, and it would keep the API the same but pull the value out to make it more destinctly something people can see and set when needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can remove it from the parameters if you prefer. Since the ELF API could be used for other purposes than just kernel symbols, and for some of those use cases, a shorter maximum symbol name length could make more sense there, I assumed that a general maximum-maximum length would be fine for all cases but allowing to customize a shorter one if for some reason it's needed. I also made it a parameter with default value to minimize the impact of this change.
The Linux kernel has it's maximum length, but for general/user-space ELF files the maximum size can be different. The kernel used to have a maximum length of 128 bytes until Rust happened. In kernels 6.1, it was increased to 512 bytes in this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, ok. I think still best to make it a constant, and then we can make a new class that overrides that class local constant down the line (and the class local constant can be set from constants if you want)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you mean something like this (internal or not)?
class elf_sym(objects.StructType):
_MAX_NAME_LENGTH = KSYM_NAME_LEN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constants.KSYM_NAME_LEN
, but yeah, so there's constants (user overridable in one clear place, class constant so that inheritors could override it, and then the actual use point). Does that make sense or is it just over complicating things? I don't mind so much, so long as it's pulled out somewhere sensible... 5:)
cached_strtab
set, allowing retrieval of critical symbol information like names.