Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic: Add first attempt at pgdscan plugin #1321

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

eve-mem
Copy link
Contributor

@eve-mem eve-mem commented Oct 25, 2024

Hi! 👋

This PR adds a basic pgdscan plugin.

I often find myself in the situation where no ISF is available for my linux sample. The debugging symbols are not provided and no information such as system map or kallsyms was collected when a memory capture is performed. I know it's sometimes a similar situation for others.

Without an ISF vol is quite limited in what it can do, and rightly so! You need the information on the complex strutures in order to correctly parse the memory.

This plugin is designed to help in this, disappointingly common, ISF-less situation. It will scan through the memory and locate heuristically what are likely to be PGDs for the various processes in the memory. You don't have an ISF so it cannot tell you the pid or comm etc.

The user part of the address space can then be dumped out allowing analysis in other tools (e.g. strings, yara, hex editor, ghidra, etc). While not as powerful as vol with a full ISF it allows you to explore the user address space in a way that would have otherwise been impossible.

Sometimes all you really need to do is find the user process you care about and dig into it's private memory - and this plugin should help with that.

It currently only supports Intel32e. I've tried my best to make it generic by reading as much information as possible from the intel layer. I simply don't have a lot of samples with 32bit OSes on to test with.

It would be possible to modify existing plugins such as linux.bash or linux.vmayarascan to accept an offset to a PGD and still provide the same results. Any plugin that focuses on scanning private memory to find results and doesn't rely on the kernel ISF (other than to parse the pslist etc) could be made to work this way.

Here is some example output:

(volatility3) eve@xps:~/Documents/volatility3$ python vol.py -r pretty -f linux-sample-1.dmp pgdscan
Volatility 3 Framework 2.11.0
Formatting...0.00               PDB scanning finished                      
  | PGD offset |     size | config
* |  0x1605000 |        0 |      -
* |  0x1ee6000 |  4239360 |      -
* |  0x4407000 |  4268032 |      -
* |  0x450a000 |   544768 |      -
* |  0x4572000 |   835584 |      -
* |  0x4590000 |  2850816 |      -
* | 0x1ac16000 |  2031616 |      -
* | 0x1aca1000 |  4517888 |      -
* | 0x1acf5000 |   200704 |      -
<snip>

N.B. the size 0 PGD is for the kernel itself and so it's actually an expected result.

This example shows dumping out the memory regions for one of the recovered PGDs and running file on the results. (I think there is probably improvements to be made to ensure that pages that are close together get mapped to a single file. At the moment I just use the output of mapping() directly.)

(volatility3) eve@xps:~/Documents/volatility3$ python vol.py -r pretty -f linux-sample-1.dmp pgdscan --dump --offset 0x4572000
Volatility 3 Framework 2.11.0
Formatting...0.00               PDB scanning finished                      
  | PGD offset |   size | config
* |  0x4572000 | 835584 |      -
(volatility3) eve@xps:~/Documents/volatility3$ file pgd.0x4572000.start.0x*
pgd.0x4572000.start.0x1223000.dmp:      data
pgd.0x4572000.start.0x1224000.dmp:      data
pgd.0x4572000.start.0x1225000.dmp:      data
pgd.0x4572000.start.0x1226000.dmp:      data
pgd.0x4572000.start.0x400000.dmp:       ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 14768
pgd.0x4572000.start.0x401000.dmp:       data
pgd.0x4572000.start.0x402000.dmp:       data
pgd.0x4572000.start.0x602000.dmp:       data
pgd.0x4572000.start.0x603000.dmp:       data
pgd.0x4572000.start.0x7fd341093000.dmp: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, missing section headers at 47552
pgd.0x4572000.start.0x7fd341094000.dmp: data
pgd.0x4572000.start.0x7fd34109c000.dmp: data
<snip>

Here is an example of saving out a config for that same PGD and dropping into volshell with the config. In volshell we can then investigate the private memory as normal.

(volatility3) eve@xps:~/Documents/volatility3$ python vol.py -f linux-sample-1.dmp pgdscan --save-configs --offset 0
x4572000
Volatility 3 Framework 2.11.0
Progress:  100.00               PDB scanning finished                      
PGD offset      size    config
Progress:    0.00               Scanning memory_layer using PageGlobalDirectoryScanner
0x4572000       835584  pgd.0x4572000.json
(volatility3) eve@xps:~/Documents/volatility3$ python volshell.py -c pgd.0x4572000.json
Volshell (Volatility 3 Framework) 2.11.0
Readline imported successfully  PDB scanning finished  

    Call help() to see available functions

    Volshell mode        : Generic
    Current Layer        : primary
    Current Symbol Table : None
    Current Kernel Name  : None

(primary) >>> db(0x400000)
0x400000    7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00    .ELF............
0x400010    02 00 3e 00 01 00 00 00 64 17 40 00 00 00 00 00    ..>.....d.@.....
0x400020    40 00 00 00 00 00 00 00 f0 32 00 00 00 00 00 00    @........2......
0x400030    00 00 00 00 40 00 38 00 09 00 40 00 1c 00 1b 00    [email protected]...@.....
0x400040    06 00 00 00 05 00 00 00 40 00 00 00 00 00 00 00    ........@.......
0x400050    40 00 40 00 00 00 00 00 40 00 40 00 00 00 00 00    @.@.....@.@.....
0x400060    f8 01 00 00 00 00 00 00 f8 01 00 00 00 00 00 00    ................
0x400070    08 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00    ................
(primary) >>> 

I'm not happy with how I've messed with build_configuration() in order to produce a config file that can be loaded into volshell. It feels like there must be an easier way...!

I'm messing with the guts of a config and private reading values out of the intel layer, there is likely to be lots of ways to do this better/smarter.... 🙈

I welcome any pointers or advice!

Thanks again!
🦊

volatility3/framework/plugins/pgdscan.py Fixed Show resolved Hide resolved
volatility3/framework/plugins/pgdscan.py Fixed Show resolved Hide resolved
volatility3/framework/plugins/pgdscan.py Fixed Show resolved Hide resolved
@eve-mem
Copy link
Contributor Author

eve-mem commented Nov 12, 2024

I've now updated this to merge output files when pages are 'close' enough together. e.g. before the region from 0x400000 was saved to three files as that is the results from the intel layer mappings, where as now they become a single file.

I've fixed the imports too.

Output example:

(volatility3) eve@xps:~/Documents/volatility3$ python vol.py -r pretty -f linux-sample-1.dmp pgdscan --dump --offset 0x4572000
Volatility 3 Framework 2.11.0
Formatting...0.00               PDB scanning finished                      
  | PGD offset |   size | configScanning memory_layer using PageGlobalDirectoryScanner
* |  0x4572000 | 835584 |      -
(volatility3) eve@xps:~/Documents/volatility3$ file pgd.0x4572000.start.0x*
pgd.0x4572000.start.0x1223000.dmp:      data
pgd.0x4572000.start.0x400000.dmp:       ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 14768
pgd.0x4572000.start.0x602000.dmp:       data
pgd.0x4572000.start.0x7fd341093000.dmp: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 47552
pgd.0x4572000.start.0x7fd34129d000.dmp: data
pgd.0x4572000.start.0x7fd3414a8000.dmp: data
pgd.0x4572000.start.0x7fd3414b9000.dmp: data
pgd.0x4572000.start.0x7fd3416be000.dmp: data
pgd.0x4572000.start.0x7fd3418c8000.dmp: data
pgd.0x4572000.start.0x7fd3418f5000.dmp: data
pgd.0x4572000.start.0x7fd34190d000.dmp: data
pgd.0x4572000.start.0x7fd341921000.dmp: data
pgd.0x4572000.start.0x7fd341931000.dmp: data
pgd.0x4572000.start.0x7fd341967000.dmp: data
pgd.0x4572000.start.0x7fd341973000.dmp: zlib compressed data
pgd.0x4572000.start.0x7fd341999000.dmp: data
pgd.0x4572000.start.0x7fd3419b2000.dmp: data
pgd.0x4572000.start.0x7fd3419d6000.dmp: data
pgd.0x4572000.start.0x7fd341a02000.dmp: data
pgd.0x4572000.start.0x7fd341a0f000.dmp: data
pgd.0x4572000.start.0x7fd341c4b000.dmp: data
pgd.0x4572000.start.0x7fd341e56000.dmp: data
pgd.0x4572000.start.0x7fd342060000.dmp: data
pgd.0x4572000.start.0x7fd342075000.dmp: data
pgd.0x4572000.start.0x7fffc716c000.dmp: data
pgd.0x4572000.start.0x7fffc71ff000.dmp: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=528965576148051e8930732ea044bbf35982a785, stripped

Thanks! 🦊

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It somewhat feels like this plugin is jumping through hoops to make use of the scanner framework? If it's not useful (because it's agnostic of the layer it's scanning) then we can just implement something similar that runs through all the pages manually? The main benefit of the scanner is page overlaps and that will never be a problem here, so it's almost overkill to try and use it?

Otherwise this seems pretty cool. There's a number of places where you should carefully double check that start + length and end do mean the same thing and there isn't an off by one error. They usually only turn up years down the line, which is why I'm mentioning checking them twice now before it goes in.

It feels like you should be able to use the structures to identify different types of tables if needed, but I don't feel this will be too hard to extend out to other architectures. Just one minor check needs adding and then I think it can go in if you're happy with it?

current_start, current_length = sorted_mappings[0]
current_end = current_start + current_length

for start, length in sorted_mappings[1:]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires that sorted_mapping contains more than one element. Might be worth a check before we call this (presumably you can just return that one if needed).

# this is the string used page struct to pack the full page of pointers into ints
self._pack_string = (
self._intel_class._entry_format[0]
+ self._intel_class._entry_format[1] * self._number_of_pointers_per_page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also somewhat hacky, buy you could presumably just copy the last character _number_of_pointers_per_page - 1 number of times. Still kinda hacky (and still relies on the format being a single letter, but it's likely and allows for both alignment and no alignment value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that's a nice idea.

):
return None

# read size from layer strcutre
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: structure

),
requirements.BooleanRequirement(
name="save-configs",
description="Save configuration JSON file to a file for each recovered PGD",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep an eye out for enhancements to the config system that should allow configs to be more reusable across plugins that have different requirements (TranslationLayerRequirement rather than ModuleRequirement, for example).

layer = self.context.layers[self.config["primary"]]

# Try to move down to the highest physical layer
if layer.config.get("memory_layer"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet have a suitable way of guaranteeing this is the lower layer (and this may not work if the lower layer has been swapped out, etc), but until we have something better this is ok. Be nice to flag it with a FIXME or a TODO, just so we can find it again in the future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's something that does pop up a fair bit. I couldn't see an issue tracking it. Do you think it's worthwhile making one? (e.g. so that it's "TODO: Re issue XXXX update to a more suitable way of guaranteeing this is the lower layer")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we never explicitly made one, but it might be good to see how many other issues might depend on it? Happy for you to spin that up, or shout and I can do it too...

# build a new layer for this likely pgd
temp_context = self.context.clone()
temp_layer_name = self.context.layers.free_layer_name("IntelLayer")
# temp_layer_name = "primary" # I would like to use the name primary but not sure how?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you just use a prefix of primary rather than IntelLayer, it should do it as long as that layer doesn't already exist (otherwise it'll come out as primary1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a play - from memory I think a layer with the 'primary' name already exists (at least in my test samples)

# TODO: Fix this. It seems like an ungly hack and must to the wrong way
# to make a new config with a new primary layer?
conf = {}
for key, value in dict(temp_layer.build_configuration()).items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I would/have done it. Definitely kinda of hacky, but I'm working on making the components of a config more reusable (by tagging their requirement type so it can be applied to "best guess" requirments of a similar type).

            new_config = {}
            config_dict = dict(primary.build_configuration())
            for entry in config_dict:
                # Volatility 1.2 support
                new_config["kernel.layer_name." + entry] = config_dict[entry]
                # Volatility <1.2 support
                new_config["primary." + entry] = config_dict[entry]
            json_str = json.dumps(new_config, sort_keys=True, indent=2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if it's how you would have thought to do it that's got to be a compliment! :D I'll reword the TODO so it's worded more professionally and make a note to revisit it when you get time to add those config bits.

@eve-mem
Copy link
Contributor Author

eve-mem commented Nov 13, 2024

Yeah, I made the scanner mostly because it seemed like the right thing to do but I could just run through the layer manually (that's exactly what my scruffy vol shell script that inspired this plugin does). I'll rejig it.

Re other architectures I do think it would be fairly easy to add them - I just don't have any samples to test with (and I've been too lazy so far to make one). I've also got much, much, less experience with them. I think in the last 5 years I've only ever seen Intel32e... 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants