Replies: 1 comment
-
If anyone wants to know more about this, I wrote two blog articles that go through this process on real artifacts. Hopefully that illustrates my use case in a tangible manner. The first one is about the creation of the placeholder source program itself. Besides generating the placeholder ELF source program itself, most of the work was creating a script that parses & imports the debugging data from the SYM file. The end result is the program that has no data but a lot of metadata. The second one is about version tracking from the placerholder source program. This is where the built-in Ghidra correlators fail to match anything and I had to improvise a bunch of scripts to create matches. It highlights some of the issues vanilla Ghidra has when dealing with this specific scenario. |
Beta Was this translation helpful? Give feedback.
-
Context
I have a somewhat peculiar problem: I have a debugging data file (Psy-Q SYM file) and an executable (PS-EXE) that are related, but not a match for each other. I can not just load the debugging information onto the executable since the functions/symbols in both files aren't located at the same addresses and aren't all identical.
To work around this, I've created a placeholder executable matching the shape of the SYM file, imported it as a Ghidra program and then loaded all the debugging information on top of it: it has sections, functions, data types and symbols all properly defined, but every single byte of the program is uninitialized. I've also imported the PS-EXE executable without the SYM file as another Ghidra program and ran auto analysis on it, as usual.
This means I have two related programs, one has no bytes but lots of metadata and the other has bytes but no metadata.
Problem
I want to use Version Tracking to port the metadata from the placeholder program to the real executable, but none of the built-in correlators can match anything since the source program is just a scaffolding for the debugging data, with no initialized bytes or references.
I've already taken a crack at it and managed some success in correlating the shapes of the functions between the programs. The idea is that pairs of matching functions between the source and destination programs are likely to not only have the same body size, but their neighbors are also likely to be similarly-sized and in the same order.
I've attached a script that demonstrates the concept by registering manual matches. It's quite dodgy*, but it does generate some amount of correct function pair matches with my artifacts. I can also provide the source placeholder and destination programs, but I don't know if it's OK to share them here since these are technically bits from a 26 years old copyrighted PlayStation demo game.
Questions
I'd be potentially interested in making contributions in these areas, but I don't want to maintain a fork of Ghidra or an extension over this, which is why I've scripted my way out of these problems so far.
*The debugging data doesn't cover the entire source program, the auto analyzers on the destination program aren't perfect and the programs are MIPS I therefore every function size is a multiple of 4. Also, the scoring and confidence formula are essentially made-up and not based on sound maths.
MultipleFunctionShapeCorrelator.java.zip
Beta Was this translation helpful? Give feedback.
All reactions