Replies: 7 comments 4 replies
-
I've started looking into parsing some other things such as namespaces and templated symbols from demangling and other sources (to be better than our current naive approach, and this will be able to drive other analyses that I plan to investigate, such as templates and ICF), and put it in my mind to look into parsing C++ and other things at the same time. I'm only investigating this among other things. Currently, just surveying the environment. We currently use antlr and javacc in some aspects of Ghidra, but in trying to consider what grammar/parsers, approach found this decent article: https://tomassetti.me/parsing-in-java/. However, I'll most likely investigate further down the antlr path, and someone pointed me to https://github.com/antlr/grammars-v4/tree/master/cpp for when considering C++ options. Again... I'm just investigating and there is nothing in concrete at this time. I know this does not address any immediate needs or work-arounds. |
Beta Was this translation helpful? Give feedback.
-
Thanks. I’ve read that a couple of times, and have used it to code up target test files with their examples. I also had to create methods to go with the class structures to be able to compile. Then as I was probing into class information within PDB, I extended these jangray examples, trying to get the PDB class information to reveal more of its meaning. I created some potential Ghidra class structure layouts using PDB which is not yet turned on because I wasn’t happy with going directly from PDB records to class structure layouts and because this might not be forward compatible with what we finalize. I feel that we need to, instead, go from PDB records to how we want to store class information (what I was calling syntactic class), and then be able to create layouts from that (each tool chain / version and data organization could then have its own different layout from from common syntactic information). Anyways, from the extended jangray examples, I created a PDB class layout test CppCompositeTypeTest. One more note about the PDB class information... in writing the extended examples, I was able to find ways of writing C++ source examples with different class layouts, but which had identical PDB class records. I had to resort to non-PDB information to create the class layout, this being in-memory vbtables. I’m not happy that the PDB didn’t have enough information to distinguish between them, though I might be missing something. There is a “hidden” VS compiler option that can be used to get truth about class layouts that I’ve used in studying and working on the investigations above. I don’t have it in front of me because I’m not logged into work, but it might be d1reportAllClassLayout. It is quite informative. |
Beta Was this translation helpful? Give feedback.
-
Perhaps some kind of hybrid approach could work if there were a way to extract just #define macros without attempting to parse the C++, the rest of the C++ could be handled by a C++ compiler. |
Beta Was this translation helpful? Give feedback.
-
Wonder whether using something like libClang to let it handle the whole lot might be a possibility? Not just for C++ but for C as well... If not having a separate binary to do the parsing (like how decompilation is handled, iirc), then a set of JNI bindings like https://github.com/okutane/libclang-java could do I guess? |
Beta Was this translation helpful? Give feedback.
-
It looks like someone made an extension that does exactly this: https://github.com/Adubbz/GhidraClangTypes |
Beta Was this translation helpful? Give feedback.
-
@ghizard if I may, what's the reasoning behind using ANTLR? The Eclipse CDT (which you seem to depend on already) includes a C++ parser library, completely in Java, as I found here some time ago: #6213 (comment) EDIT: I have also found a 2012 paper discussing this approach. If you're interested let me know |
Beta Was this translation helpful? Give feedback.
-
I'm partial towards using clang as well since several other RE tools now leverage it for the same purpose(ida-clang, Binary Ninja) or other RE-adjacent tasks use it(e.g. fuzz harness creation, compiling with clang's sanitizers, generating generic type libraries, etc.). Would be convenient to standardize on clang vs needing to adapt to a different tool chain. |
Beta Was this translation helpful? Give feedback.
-
Are there any plans to upgrade the C-parser to accommodate C++? Or has anyone any workarounds?
Beta Was this translation helpful? Give feedback.
All reactions