Replies: 3 comments 3 replies
-
This is how it currently handles Java. Iirc there is still code in the native decompiler code base to explicitly handle it so I think it would be required for other languages as well. |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing what you know. I looked at the code in links suggested and also looked at Come Get Your Free NSA Reverse Engineering Tool. What I can figure out from this is that the impedance mismatch between Python bytecode and what is currently in Ghidra is so great that it would be foolhardy for anyone to try to undertake this except as a full-time dedicated activity. And even there, I wouldn't expect much. It also doesn't really address the questions I have... Is there interest in decompiling Python bytecode?I'll take the lack of response not I've gotten here and elsewhere as a: "no, not really". In my own first-hand experience was that back in 2020 three people from Microsoft contacted me about some malware written in Python 2.7 that they needed to decompile. The good news there is that for Python 2.7, decompilers are readily available. For current Python: 3.9 and 3.10, not yet. But the microsoft security team was more of a one-of-the-kind occurrence than something periodic. A not uncommon attitude is that if it worked out magically somehow the last time, it will work out the next time as well. And if it doesn't, then is the time to start thinking about adjusting interest. Other than that thing from 2020 and hearing vague reports that some malware written is written in Python (which might be on the uptick), I guess things have been quiet so far as I know. Is there interest more broadly in decompiling Dynamic High-level languages in general?Again, I would assume no since if there were, that interest would appear in Python. Is there any research going on in decompilers?I am not finding anything showing up that isn't continuation of stuff a decade old or more. And within that time, I have come to realize that the area of decompilation in dynamic high-level languages hasn't been studied to any extent as a class in of itself. I have also come to realize that decompilers for this class are a bit different from the general-purpose kind of decompiler that appears in Ghidra. There are many decompilers out there for such specific high-level languages, but the ones I've looked at use a seat-of-the-pants approach My own experience leads me to believe much that could be done, if there was interest in reducing the ad-hoc-ness of this kind of decompiler. But again, right now this stuff is a bit labor intensive which means you don't expect a lot of activity to appear without some motivation. I've written about the difference between general-purpose decompilers and high-level dynamic language decompilers here. Lastly with respect to research on decompilers: there were some university assistant professors and their grad students that contacted a few months back about a paper they hoped to present based on the decompilers I've worked on. But like the microsoft security team incident this is more of a one-of-a-kind activity than something that happens time to time. |
Beta Was this translation helpful? Give feedback.
-
Not sure if it is of interest, but I came across this when looking for a way to decompile Javascript (V8 bytecode) so figured I'd link that approach: https://swarm.ptsecurity.com/how-we-bypassed-bytenode-and-decompiled-node-js-bytecode-in-ghidra/ |
Beta Was this translation helpful? Give feedback.
-
How does ghidra handle decompiling high-level languages that compile to their own internal representation?
There is a class of languages that work like Python where:
Possibly WebGL, Java, Lua and others fall into this class. GNU Emacs and Ruby I think most definitely do.
I've been interested in the Python case for a while and the method I've been using seems to be more applicable outside of Python and even what I will call "general-purpose" decompilers. Although I think there are ideas that are very amenable to machine learning, all of this is a bit labor intensive. And that's why I haven't strayed far outside of Python.
But I wondered if there was any research done here. I know from the reverse engineering side there has been a lot of interest in Python decompilation.
I also understand that there has been (still is?) malware written in Python. Is that it or are there other programming languages used for which decompilers are in need?
Beta Was this translation helpful? Give feedback.
All reactions