-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic: Add first attempt at pgdscan plugin #1321
base: develop
Are you sure you want to change the base?
Conversation
I've now updated this to merge output files when pages are 'close' enough together. e.g. before the region from I've fixed the imports too. Output example: (volatility3) eve@xps:~/Documents/volatility3$ python vol.py -r pretty -f linux-sample-1.dmp pgdscan --dump --offset 0x4572000
Volatility 3 Framework 2.11.0
Formatting...0.00 PDB scanning finished
| PGD offset | size | configScanning memory_layer using PageGlobalDirectoryScanner
* | 0x4572000 | 835584 | -
(volatility3) eve@xps:~/Documents/volatility3$ file pgd.0x4572000.start.0x*
pgd.0x4572000.start.0x1223000.dmp: data
pgd.0x4572000.start.0x400000.dmp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 14768
pgd.0x4572000.start.0x602000.dmp: data
pgd.0x4572000.start.0x7fd341093000.dmp: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 47552
pgd.0x4572000.start.0x7fd34129d000.dmp: data
pgd.0x4572000.start.0x7fd3414a8000.dmp: data
pgd.0x4572000.start.0x7fd3414b9000.dmp: data
pgd.0x4572000.start.0x7fd3416be000.dmp: data
pgd.0x4572000.start.0x7fd3418c8000.dmp: data
pgd.0x4572000.start.0x7fd3418f5000.dmp: data
pgd.0x4572000.start.0x7fd34190d000.dmp: data
pgd.0x4572000.start.0x7fd341921000.dmp: data
pgd.0x4572000.start.0x7fd341931000.dmp: data
pgd.0x4572000.start.0x7fd341967000.dmp: data
pgd.0x4572000.start.0x7fd341973000.dmp: zlib compressed data
pgd.0x4572000.start.0x7fd341999000.dmp: data
pgd.0x4572000.start.0x7fd3419b2000.dmp: data
pgd.0x4572000.start.0x7fd3419d6000.dmp: data
pgd.0x4572000.start.0x7fd341a02000.dmp: data
pgd.0x4572000.start.0x7fd341a0f000.dmp: data
pgd.0x4572000.start.0x7fd341c4b000.dmp: data
pgd.0x4572000.start.0x7fd341e56000.dmp: data
pgd.0x4572000.start.0x7fd342060000.dmp: data
pgd.0x4572000.start.0x7fd342075000.dmp: data
pgd.0x4572000.start.0x7fffc716c000.dmp: data
pgd.0x4572000.start.0x7fffc71ff000.dmp: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=528965576148051e8930732ea044bbf35982a785, stripped Thanks! 🦊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It somewhat feels like this plugin is jumping through hoops to make use of the scanner framework? If it's not useful (because it's agnostic of the layer it's scanning) then we can just implement something similar that runs through all the pages manually? The main benefit of the scanner is page overlaps and that will never be a problem here, so it's almost overkill to try and use it?
Otherwise this seems pretty cool. There's a number of places where you should carefully double check that start + length
and end
do mean the same thing and there isn't an off by one error. They usually only turn up years down the line, which is why I'm mentioning checking them twice now before it goes in.
It feels like you should be able to use the structures to identify different types of tables if needed, but I don't feel this will be too hard to extend out to other architectures. Just one minor check needs adding and then I think it can go in if you're happy with it?
current_start, current_length = sorted_mappings[0] | ||
current_end = current_start + current_length | ||
|
||
for start, length in sorted_mappings[1:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires that sorted_mapping contains more than one element. Might be worth a check before we call this (presumably you can just return that one if needed).
# this is the string used page struct to pack the full page of pointers into ints | ||
self._pack_string = ( | ||
self._intel_class._entry_format[0] | ||
+ self._intel_class._entry_format[1] * self._number_of_pointers_per_page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also somewhat hacky, buy you could presumably just copy the last character _number_of_pointers_per_page - 1
number of times. Still kinda hacky (and still relies on the format being a single letter, but it's likely and allows for both alignment and no alignment value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's a nice idea.
): | ||
return None | ||
|
||
# read size from layer strcutre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: structure
), | ||
requirements.BooleanRequirement( | ||
name="save-configs", | ||
description="Save configuration JSON file to a file for each recovered PGD", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep an eye out for enhancements to the config system that should allow configs to be more reusable across plugins that have different requirements (TranslationLayerRequirement
rather than ModuleRequirement
, for example).
layer = self.context.layers[self.config["primary"]] | ||
|
||
# Try to move down to the highest physical layer | ||
if layer.config.get("memory_layer"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't yet have a suitable way of guaranteeing this is the lower layer (and this may not work if the lower layer has been swapped out, etc), but until we have something better this is ok. Be nice to flag it with a FIXME or a TODO, just so we can find it again in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it's something that does pop up a fair bit. I couldn't see an issue tracking it. Do you think it's worthwhile making one? (e.g. so that it's "TODO: Re issue XXXX update to a more suitable way of guaranteeing this is the lower layer")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we never explicitly made one, but it might be good to see how many other issues might depend on it? Happy for you to spin that up, or shout and I can do it too...
# build a new layer for this likely pgd | ||
temp_context = self.context.clone() | ||
temp_layer_name = self.context.layers.free_layer_name("IntelLayer") | ||
# temp_layer_name = "primary" # I would like to use the name primary but not sure how? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you just use a prefix of primary
rather than IntelLayer
, it should do it as long as that layer doesn't already exist (otherwise it'll come out as primary1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a play - from memory I think a layer with the 'primary' name already exists (at least in my test samples)
# TODO: Fix this. It seems like an ungly hack and must to the wrong way | ||
# to make a new config with a new primary layer? | ||
conf = {} | ||
for key, value in dict(temp_layer.build_configuration()).items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I would/have done it. Definitely kinda of hacky, but I'm working on making the components of a config more reusable (by tagging their requirement type so it can be applied to "best guess" requirments of a similar type).
new_config = {}
config_dict = dict(primary.build_configuration())
for entry in config_dict:
# Volatility 1.2 support
new_config["kernel.layer_name." + entry] = config_dict[entry]
# Volatility <1.2 support
new_config["primary." + entry] = config_dict[entry]
json_str = json.dumps(new_config, sort_keys=True, indent=2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if it's how you would have thought to do it that's got to be a compliment! :D I'll reword the TODO so it's worded more professionally and make a note to revisit it when you get time to add those config bits.
Yeah, I made the scanner mostly because it seemed like the right thing to do but I could just run through the layer manually (that's exactly what my scruffy vol shell script that inspired this plugin does). I'll rejig it. Re other architectures I do think it would be fairly easy to add them - I just don't have any samples to test with (and I've been too lazy so far to make one). I've also got much, much, less experience with them. I think in the last 5 years I've only ever seen Intel32e... 🙈 |
Hi! 👋
This PR adds a basic pgdscan plugin.
I often find myself in the situation where no ISF is available for my linux sample. The debugging symbols are not provided and no information such as system map or kallsyms was collected when a memory capture is performed. I know it's sometimes a similar situation for others.
Without an ISF vol is quite limited in what it can do, and rightly so! You need the information on the complex strutures in order to correctly parse the memory.
This plugin is designed to help in this, disappointingly common, ISF-less situation. It will scan through the memory and locate heuristically what are likely to be PGDs for the various processes in the memory. You don't have an ISF so it cannot tell you the pid or comm etc.
The user part of the address space can then be dumped out allowing analysis in other tools (e.g. strings, yara, hex editor, ghidra, etc). While not as powerful as vol with a full ISF it allows you to explore the user address space in a way that would have otherwise been impossible.
Sometimes all you really need to do is find the user process you care about and dig into it's private memory - and this plugin should help with that.
It currently only supports Intel32e. I've tried my best to make it generic by reading as much information as possible from the intel layer. I simply don't have a lot of samples with 32bit OSes on to test with.
It would be possible to modify existing plugins such as
linux.bash
orlinux.vmayarascan
to accept an offset to a PGD and still provide the same results. Any plugin that focuses on scanning private memory to find results and doesn't rely on the kernel ISF (other than to parse the pslist etc) could be made to work this way.Here is some example output:
N.B. the size 0 PGD is for the kernel itself and so it's actually an expected result.
This example shows dumping out the memory regions for one of the recovered PGDs and running
file
on the results. (I think there is probably improvements to be made to ensure that pages that are close together get mapped to a single file. At the moment I just use the output of mapping() directly.)Here is an example of saving out a config for that same PGD and dropping into volshell with the config. In volshell we can then investigate the private memory as normal.
I'm not happy with how I've messed with
build_configuration()
in order to produce a config file that can be loaded into volshell. It feels like there must be an easier way...!I'm messing with the guts of a config and private reading values out of the intel layer, there is likely to be lots of ways to do this better/smarter.... 🙈
I welcome any pointers or advice!
Thanks again!
🦊