PoC/RFC: Correlators for placeholder source programs inside Version Tracking #6366

boricj · 2024-03-30T15:57:34Z

boricj
Mar 30, 2024

Context

I have a somewhat peculiar problem: I have a debugging data file (Psy-Q SYM file) and an executable (PS-EXE) that are related, but not a match for each other. I can not just load the debugging information onto the executable since the functions/symbols in both files aren't located at the same addresses and aren't all identical.

To work around this, I've created a placeholder executable matching the shape of the SYM file, imported it as a Ghidra program and then loaded all the debugging information on top of it: it has sections, functions, data types and symbols all properly defined, but every single byte of the program is uninitialized. I've also imported the PS-EXE executable without the SYM file as another Ghidra program and ran auto analysis on it, as usual.

This means I have two related programs, one has no bytes but lots of metadata and the other has bytes but no metadata.

Problem

I want to use Version Tracking to port the metadata from the placeholder program to the real executable, but none of the built-in correlators can match anything since the source program is just a scaffolding for the debugging data, with no initialized bytes or references.

I've already taken a crack at it and managed some success in correlating the shapes of the functions between the programs. The idea is that pairs of matching functions between the source and destination programs are likely to not only have the same body size, but their neighbors are also likely to be similarly-sized and in the same order.

I've attached a script that demonstrates the concept by registering manual matches. It's quite dodgy*, but it does generate some amount of correct function pair matches with my artifacts. I can also provide the source placeholder and destination programs, but I don't know if it's OK to share them here since these are technically bits from a 26 years old copyrighted PlayStation demo game.

Questions

Is the Ghidra team interested in these kinds of correlators?
Should Ghidra have the ability to create placeholder programs as an alternative to importation? I've created my placeholder program by hand-writing an ELF file. Alternatively, I think it might be possible to jury-rig one with the raw file importer and creating all the memory blocks by hand, but that's not very user-friendly.
Is this use case worth making improvements to upstream Ghidra? I can see it happening with external debug files or with linker map files, but I don't know how common that situation is. Things like collapsing uninitialized bytes inside defined functions in the listing view would be helpful for example.

I'd be potentially interested in making contributions in these areas, but I don't want to maintain a fork of Ghidra or an extension over this, which is why I've scripted my way out of these problems so far.

*The debugging data doesn't cover the entire source program, the auto analyzers on the destination program aren't perfect and the programs are MIPS I therefore every function size is a multiple of 4. Also, the scoring and confidence formula are essentially made-up and not based on sound maths.

MultipleFunctionShapeCorrelator.java.zip

boricj · 2024-04-15T19:24:57Z

boricj
Apr 15, 2024
Author

If anyone wants to know more about this, I wrote two blog articles that go through this process on real artifacts. Hopefully that illustrates my use case in a tangible manner.

The first one is about the creation of the placeholder source program itself. Besides generating the placeholder ELF source program itself, most of the work was creating a script that parses & imports the debugging data from the SYM file. The end result is the program that has no data but a lot of metadata.

The second one is about version tracking from the placerholder source program. This is where the built-in Ghidra correlators fail to match anything and I had to improvise a bunch of scripts to create matches. It highlights some of the issues vanilla Ghidra has when dealing with this specific scenario.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC/RFC: Correlators for placeholder source programs inside Version Tracking #6366

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

PoC/RFC: Correlators for placeholder source programs inside Version Tracking #6366

boricj Mar 30, 2024

Context

Problem

Questions

Replies: 1 comment

boricj Apr 15, 2024 Author

boricj
Mar 30, 2024

boricj
Apr 15, 2024
Author