From d5df1c16bb36831747078fa6a30847075744d69f Mon Sep 17 00:00:00 2001 From: d-millar <33498836+d-millar@users.noreply.github.com> Date: Wed, 8 Jan 2025 13:16:34 -0500 Subject: [PATCH] GP-326: never say die GP-326: recompiling to htmnl GP-326: recompiling to htmnl GP-326: last? GP-326: getting there GP-326: roll along GP-326: rolling along GP-326: test fix GP-326: miscellaneous post-review fixes GP-326: complicated stuff GP-326: more simple stuff GP-326: navhead fix GP-326: better docs GP-326: html for md GP-326: html for md GP-326: tutorial edits GP-326: tutorial edits GP-326: re-arranging docs GP-326: from review GP-326: adding a debugger GP-326: docs GP-326: using TestResources - tests pass GP-326: working tests GP-326: most cmd/meth tests working GP-326: cmd tests pass GP-326: passes thru putmem GP-326: one test running GP-326: better startup logic GP-326: first pass tests GP-326: misc cleanup GP-326: cleaner startup GP-326: cleanup GP-326: fixes for crash dump GP-326: util cleanup GP-326: objects cont. GP-326: first pass at objects GP-326: some cleanup GP-326: regions GP-326: sections GP-326: modules GP-326: alt launchers GP-326: symbols GP-326: memory GP-326: stack frame - regs + locals GP-326: frames GP-326: threads GP-326: better start sequence GP-326: working launcher GP-326: util.version GP-326: arch --- .../Debug/Debugger-agent-drgn/Module.manifest | 0 Ghidra/Debug/Debugger-agent-drgn/README.md | 1 + Ghidra/Debug/Debugger-agent-drgn/build.gradle | 20 + .../certification.manifest | 11 + .../data/debugger-launchers/core-drgn.sh | 32 + .../data/debugger-launchers/kernel-drgn.sh | 31 + .../data/debugger-launchers/local-drgn.sh | 34 + .../data/support/local-drgn.py | 57 + .../Debugger-agent-drgn/src/main/py/LICENSE | 11 + .../src/main/py/MANIFEST.in | 1 + .../Debugger-agent-drgn/src/main/py/README.md | 3 + .../src/main/py/pyproject.toml | 25 + .../src/main/py/src/ghidradrgn/__init__.py | 16 + .../src/main/py/src/ghidradrgn/arch.py | 209 +++ .../src/main/py/src/ghidradrgn/commands.py | 1411 +++++++++++++++++ .../src/main/py/src/ghidradrgn/hooks.py | 249 +++ .../src/main/py/src/ghidradrgn/methods.py | 388 +++++ .../src/main/py/src/ghidradrgn/schema.xml | 183 +++ .../src/main/py/src/ghidradrgn/util.py | 115 ++ .../TraceRmiLauncherServicePlugin.html | 68 + .../drgn/rmi/AbstractDrgnTraceRmiTest.java | 379 +++++ .../java/agent/drgn/rmi/DrgnCommandsTest.java | 909 +++++++++++ .../java/agent/drgn/rmi/DrgnMethodsTest.java | 286 ++++ .../Debugger/B5-AddingDebuggers.html | 148 ++ .../Debugger/B5-AddingDebuggers.md | 224 +++ GhidraDocs/GhidraClass/Debugger/Makefile | 1 + GhidraDocs/GhidraClass/Debugger/navhead.htm | 3 +- GhidraDocs/certification.manifest | 2 + 28 files changed, 4816 insertions(+), 1 deletion(-) create mode 100644 Ghidra/Debug/Debugger-agent-drgn/Module.manifest create mode 100644 Ghidra/Debug/Debugger-agent-drgn/README.md create mode 100644 Ghidra/Debug/Debugger-agent-drgn/build.gradle create mode 100644 Ghidra/Debug/Debugger-agent-drgn/certification.manifest create mode 100755 Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/core-drgn.sh create mode 100755 Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/kernel-drgn.sh create mode 100755 Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/local-drgn.sh create mode 100644 Ghidra/Debug/Debugger-agent-drgn/data/support/local-drgn.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/LICENSE create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/MANIFEST.in create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/README.md create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/pyproject.toml create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/__init__.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/arch.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/commands.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/hooks.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/methods.py create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/schema.xml create mode 100644 Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/util.py create mode 100644 Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/AbstractDrgnTraceRmiTest.java create mode 100644 Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/DrgnCommandsTest.java create mode 100644 Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/DrgnMethodsTest.java create mode 100644 GhidraDocs/GhidraClass/Debugger/B5-AddingDebuggers.html create mode 100644 GhidraDocs/GhidraClass/Debugger/B5-AddingDebuggers.md diff --git a/Ghidra/Debug/Debugger-agent-drgn/Module.manifest b/Ghidra/Debug/Debugger-agent-drgn/Module.manifest new file mode 100644 index 00000000000..e69de29bb2d diff --git a/Ghidra/Debug/Debugger-agent-drgn/README.md b/Ghidra/Debug/Debugger-agent-drgn/README.md new file mode 100644 index 00000000000..65c052c5db9 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/README.md @@ -0,0 +1 @@ +# Debugger-agent-drgn diff --git a/Ghidra/Debug/Debugger-agent-drgn/build.gradle b/Ghidra/Debug/Debugger-agent-drgn/build.gradle new file mode 100644 index 00000000000..443df89fbf3 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/build.gradle @@ -0,0 +1,20 @@ +/* ### + * IP: GHIDRA + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +apply from: "$rootProject.projectDir/gradle/distributableGhidraModule.gradle" +apply from: "$rootProject.projectDir/gradle/hasPythonPackage.gradle" + +apply plugin: 'eclipse' +eclipse.project.name = 'Debug Debugger-agent-drgn' diff --git a/Ghidra/Debug/Debugger-agent-drgn/certification.manifest b/Ghidra/Debug/Debugger-agent-drgn/certification.manifest new file mode 100644 index 00000000000..d342dc45652 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/certification.manifest @@ -0,0 +1,11 @@ +##VERSION: 2.0 +##MODULE IP: Apache License 2.0 +##MODULE IP: Apache License 2.0 with LLVM Exceptions +Module.manifest||GHIDRA||||END| +README.md||GHIDRA||||END| +build.gradle||GHIDRA||||END| +src/main/py/LICENSE||GHIDRA||||END| +src/main/py/MANIFEST.in||GHIDRA||||END| +src/main/py/README.md||GHIDRA||||END| +src/main/py/pyproject.toml||GHIDRA||||END| +src/main/py/src/ghidradrgn/schema.xml||GHIDRA||||END| diff --git a/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/core-drgn.sh b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/core-drgn.sh new file mode 100755 index 00000000000..e1bb7a2db0f --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/core-drgn.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +#@title drgn-core +#@desc
+#@desc+#@desc This will attach to an existing core dump using drgn. +#@desc For setup instructions, press F1. +#@desc
+#@desc +#@menu-group drgn +#@icon icon.debugger +#@help TraceRmiLauncherServicePlugin#drgn-core +#@env OPT_TARGET_IMG:file!="" "Core dump" "The target core dump" + +export OPT_TARGET_KIND="coredump" +drgn -c "$OPT_TARGET_IMG" ../support/local-drgn.py + diff --git a/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/kernel-drgn.sh b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/kernel-drgn.sh new file mode 100755 index 00000000000..bf751266d47 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/kernel-drgn.sh @@ -0,0 +1,31 @@ +#!/usr/bin/env bash +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +#@title drgn-kernel +#@desc +#@desc+#@desc This will attach to the local machine's kernel using drgn. +#@desc For setup instructions, press F1. +#@desc
+#@desc +#@menu-group drgn +#@icon icon.debugger +#@help TraceRmiLauncherServicePlugin#drgn-kernel + +export OPT_TARGET_KIND="kernel" +sudo -E drgn ../support/local-drgn.py + diff --git a/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/local-drgn.sh b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/local-drgn.sh new file mode 100755 index 00000000000..edf9d0f94ef --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/data/debugger-launchers/local-drgn.sh @@ -0,0 +1,34 @@ +#!/usr/bin/env bash +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +#@title drgn +#@desc +#@desc+#@desc This will attach to a target running on the local machine using drgn. +#@desc For setup instructions, press F1. +#@desc
+#@desc +#@menu-group drgn +#@icon icon.debugger +#@help TraceRmiLauncherServicePlugin#drgn +#@env OPT_TARGET_PID:int=44068 "PID" "The target's process id" + +export OPT_TARGET_KIND="user" +# sudo -E drgn -p "$OPT_TARGET_PID" ../support/local-drgn.py +# or 'echo 0 > /proc/sys/kernel/yama/ptrace_scope' +drgn -p "$OPT_TARGET_PID" ../support/local-drgn.py + diff --git a/Ghidra/Debug/Debugger-agent-drgn/data/support/local-drgn.py b/Ghidra/Debug/Debugger-agent-drgn/data/support/local-drgn.py new file mode 100644 index 00000000000..2d5e97afa8f --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/data/support/local-drgn.py @@ -0,0 +1,57 @@ +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## + +# From drgn: +# EASY-INSTALL-ENTRY-SCRIPT: 'drgn==0.0.24','console_scripts','drgn' +import os +import re +import sys + +import drgn.cli + +home = os.getenv('GHIDRA_HOME') + +if os.path.isdir(f'{home}/ghidra/.git'): + sys.path.append( + f'{home}/ghidra/Ghidra/Debug/Debugger-agent-drgn/build/pypkg/src') + sys.path.append( + f'{home}/ghidra/Ghidra/Debug/Debugger-rmi-trace/build/pypkg/src') +elif os.path.isdir(f'{home}/.git'): + sys.path.append( + f'{home}/Ghidra/Debug/Debugger-agent-drgn/build/pypkg/src') + sys.path.append( + f'{home}/Ghidra/Debug/Debugger-rmi-trace/build/pypkg/src') +else: + sys.path.append( + f'{home}/Ghidra/Debug/Debugger-agent-drgn/pypkg/src') + sys.path.append(f'{home}/Ghidra/Debug/Debugger-rmi-trace/pypkg/src') + + +def main(): + from ghidradrgn import commands as cmd + cmd.ghidra_trace_connect(address=os.getenv('GHIDRA_TRACE_RMI_ADDR')) + cmd.ghidra_trace_create(start_trace=True) + cmd.ghidra_trace_txstart() + cmd.ghidra_trace_put_all() + cmd.ghidra_trace_txcommit() + cmd.ghidra_trace_activate() + drgn.cli.run_interactive(cmd.prog) + + +if __name__ == '__main__': + main() + + diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/LICENSE b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/LICENSE new file mode 100644 index 00000000000..c026b6b79a3 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/LICENSE @@ -0,0 +1,11 @@ +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/MANIFEST.in b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/MANIFEST.in new file mode 100644 index 00000000000..0fc1562e1d4 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/MANIFEST.in @@ -0,0 +1 @@ +include src/ghidradrgn/schema.xml \ No newline at end of file diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/README.md b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/README.md new file mode 100644 index 00000000000..ba7656544f7 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/README.md @@ -0,0 +1,3 @@ +# Ghidra Trace RMI for drgn + +Package for connecting drgn to Ghidra via Trace RMI. diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/pyproject.toml b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/pyproject.toml new file mode 100644 index 00000000000..516a2ffc324 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/pyproject.toml @@ -0,0 +1,25 @@ +[build-system] +requires = ["setuptools"] +build-backend = "setuptools.build_meta" + +[project] +name = "ghidradrgn" +version = "11.3" +authors = [ + { name="Ghidra Development Team" }, +] +description = "Ghidra's Plugin for drgn" +readme = "README.md" +requires-python = ">=3.7" +classifiers = [ + "Programming Language :: Python :: 3", + "License :: OSI Approved :: Apache Software License", + "Operating System :: OS Independent", +] +dependencies = [ + "ghidratrace==11.3", +] + +[project.urls] +"Homepage" = "https://github.com/NationalSecurityAgency/ghidra" +"Bug Tracker" = "https://github.com/NationalSecurityAgency/ghidra/issues" diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/__init__.py b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/__init__.py new file mode 100644 index 00000000000..7e7e1e10534 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/__init__.py @@ -0,0 +1,16 @@ +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +from . import util, commands diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/arch.py b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/arch.py new file mode 100644 index 00000000000..bb4e9e6ee08 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/arch.py @@ -0,0 +1,209 @@ +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +from ghidratrace.client import Address, RegVal +import drgn + +from . import util + + +# NOTE: This map is derived from the ldefs using a script +language_map = { + 'AARCH64': ['AARCH64:BE:64:v8A', 'AARCH64:LE:64:AppleSilicon', 'AARCH64:LE:64:v8A'], + 'ARM': ['ARM:BE:32:v8', 'ARM:BE:32:v8T', 'ARM:LE:32:v8', 'ARM:LE:32:v8T'], + 'PPC64': ['PowerPC:BE:64:4xx', 'PowerPC:LE:64:4xx'], + 'S390': [], + 'S390X': [], + 'I386': ['x86:LE:32:default'], + 'X86_64': ['x86:LE:64:default'], + 'UNKNOWN': ['DATA:LE:64:default', 'DATA:LE:64:default'], +} + +data64_compiler_map = { + None: 'pointer64', +} + +default_compiler_map = { + 'Language.C': 'default', +} + +x86_compiler_map = { + 'Language.C': 'gcc', +} + +compiler_map = { + 'DATA:BE:64:': data64_compiler_map, + 'DATA:LE:64:': data64_compiler_map, + 'x86:LE:32:': x86_compiler_map, + 'x86:LE:64:': x86_compiler_map, + 'AARCH64:LE:64:': default_compiler_map, + 'ARM:BE:32:': default_compiler_map, + 'ARM:LE:32:': default_compiler_map, + 'PowerPC:BE:64:': default_compiler_map, + 'PowerPC:LE:64:': default_compiler_map, +} + + +def get_arch(): + platform = drgn.host_platform + return platform.arch.name + + +def get_endian(): + parm = util.get_convenience_variable('endian') + if parm != 'auto': + return parm + platform = drgn.host_platform + order = platform.flags.IS_LITTLE_ENDIAN + if order.value > 0: + return 'little' + else: + return 'big' + + +def get_size(): + parm = util.get_convenience_variable('size') + if parm != 'auto': + return parm + platform = drgn.host_platform + order = platform.flags.IS_64_BIT + if order.value > 0: + return '64' + else: + return '32' + + +def get_osabi(): + return "Language.C" + + +def compute_ghidra_language(): + # First, check if the parameter is set + lang = util.get_convenience_variable('ghidra-language') + if lang != 'auto': + return lang + + # Get the list of possible languages for the arch. We'll need to sift + # through them by endian and probably prefer default/simpler variants. The + # heuristic for "simpler" will be 'default' then shortest variant id. + arch = get_arch() + endian = get_endian() + sz = get_size() + lebe = ':BE:' if endian == 'big' else ':LE:' + if not arch in language_map: + return 'DATA' + lebe + sz +':default' + langs = language_map[arch] + matched_endian = sorted( + (l for l in langs if lebe in l), + key=lambda l: 0 if l.endswith(':default') else len(l) + ) + if len(matched_endian) > 0: + return matched_endian[0] + # NOTE: I'm disinclined to fall back to a language match with wrong endian. + return 'DATA' + lebe + sz + ':default' + + +def compute_ghidra_compiler(lang): + # First, check if the parameter is set + comp = util.get_convenience_variable('ghidra-compiler') + if comp != 'auto': + return comp + + # Check if the selected lang has specific compiler recommendations + matched_lang = sorted( + (l for l in compiler_map if l in lang), +# key=lambda l: compiler_map[l] + ) + if len(matched_lang) == 0: + print(f"{lang} not found in compiler map - using default compiler") + return 'default' + + comp_map = compiler_map[matched_lang[0]] + if comp_map == data64_compiler_map: + print(f"Using the DATA64 compiler map") + osabi = get_osabi() + if osabi in comp_map: + return comp_map[osabi] + if lang.startswith("X86:"): + print(f"{osabi} not found in compiler map - using gcc") + return 'gcc' + if None in comp_map: + return comp_map[None] + print(f"{osabi} not found in compiler map - using default compiler") + return 'default' + + +def compute_ghidra_lcsp(): + lang = compute_ghidra_language() + comp = compute_ghidra_compiler(lang) + return lang, comp + + +class DefaultMemoryMapper(object): + + def __init__(self, defaultSpace): + self.defaultSpace = defaultSpace + + def map(self, proc: drgn.Program, offset: int): + space = self.defaultSpace + return self.defaultSpace, Address(space, offset) + + def map_back(self, proc: drgn.Program, address: Address) -> int: + if address.space == self.defaultSpace: + return address.offset + raise ValueError( + f"Address {address} is not in process {proc}") + + +DEFAULT_MEMORY_MAPPER = DefaultMemoryMapper('ram') + +memory_mappers = {} + + +def compute_memory_mapper(lang): + if not lang in memory_mappers: + return DEFAULT_MEMORY_MAPPER + return memory_mappers[lang] + + +class DefaultRegisterMapper(object): + + def __init__(self, byte_order): + if not byte_order in ['big', 'little']: + raise ValueError("Invalid byte_order: {}".format(byte_order)) + self.byte_order = byte_order + self.union_winners = {} + + def map_name(self, proc, name): + return name + + def map_value(self, proc, name, value): + return RegVal(self.map_name(proc, name), value) + + def map_name_back(self, proc, name): + return name + + def map_value_back(self, proc, name, value): + return RegVal(self.map_name_back(proc, name), value) + + +DEFAULT_BE_REGISTER_MAPPER = DefaultRegisterMapper('big') +DEFAULT_LE_REGISTER_MAPPER = DefaultRegisterMapper('little') + +def compute_register_mapper(lang): + if ':BE:' in lang: + return DEFAULT_BE_REGISTER_MAPPER + else: + return DEFAULT_LE_REGISTER_MAPPER diff --git a/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/commands.py b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/commands.py new file mode 100644 index 00000000000..8f308ce4523 --- /dev/null +++ b/Ghidra/Debug/Debugger-agent-drgn/src/main/py/src/ghidradrgn/commands.py @@ -0,0 +1,1411 @@ +## ### +# IP: GHIDRA +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +## +import code +from contextlib import contextmanager +import inspect +import os.path +import re +import socket +import sys +import time + +import drgn +import drgn.cli +from ghidratrace import sch +from ghidratrace.client import Client, Address, AddressRange, TraceObject + +from . import util, arch, methods, hooks + + +PAGE_SIZE = 4096 + +AVAILABLES_PATH = 'Available' +AVAILABLE_KEY_PATTERN = '[{pid}]' +AVAILABLE_PATTERN = AVAILABLES_PATH + AVAILABLE_KEY_PATTERN +PROCESSES_PATH = 'Processes' +PROCESS_KEY_PATTERN = '[{procnum}]' +PROCESS_PATTERN = PROCESSES_PATH + PROCESS_KEY_PATTERN +ENV_PATTERN = PROCESS_PATTERN + '.Environment' +THREADS_PATTERN = PROCESS_PATTERN + '.Threads' +THREAD_KEY_PATTERN = '[{tnum}]' +THREAD_PATTERN = THREADS_PATTERN + THREAD_KEY_PATTERN +STACK_PATTERN = THREAD_PATTERN + '.Stack' +FRAME_KEY_PATTERN = '[{level}]' +FRAME_PATTERN = STACK_PATTERN + FRAME_KEY_PATTERN +REGS_PATTERN = FRAME_PATTERN + '.Registers' +LOCALS_PATTERN = FRAME_PATTERN + '.Locals' +MEMORY_PATTERN = PROCESS_PATTERN + '.Memory' +REGION_KEY_PATTERN = '[{start:08x}]' +REGION_PATTERN = MEMORY_PATTERN + REGION_KEY_PATTERN +MODULES_PATTERN = PROCESS_PATTERN + '.Modules' +MODULE_KEY_PATTERN = '[{modpath}]' +MODULE_PATTERN = MODULES_PATTERN + MODULE_KEY_PATTERN +SECTIONS_PATTERN = MODULE_PATTERN + '.Sections' +SECTION_KEY_PATTERN = '[{secname}]' +SECTION_PATTERN = SECTIONS_PATTERN + SECTION_KEY_PATTERN +SYMBOLS_PATTERN = PROCESS_PATTERN + '.Symbols' +SYMBOL_KEY_PATTERN = '[{sid}]' +SYMBOL_PATTERN = SYMBOLS_PATTERN + SYMBOL_KEY_PATTERN + +PROGRAMS = {} + +class ErrorWithCode(Exception): + + def __init__(self, code): + self.code = code + + def __str__(self) -> str: + return repr(self.code) + + +class State(object): + + def __init__(self): + self.reset_client() + + def require_client(self): + if self.client is None: + raise RuntimeError("Not connected") + return self.client + + def require_no_client(self): + if self.client != None: + raise RuntimeError("Already connected") + + def reset_client(self): + self.client = None + self.reset_trace() + + def require_trace(self): + if self.trace is None: + raise RuntimeError("No trace active") + return self.trace + + def require_no_trace(self): + if self.trace != None: + raise RuntimeError("Trace already started") + + def reset_trace(self): + self.trace = None + util.set_convenience_variable('_ghidra_tracing', "false") + self.reset_tx() + + def require_tx(self): + if self.tx is None: + raise RuntimeError("No transaction") + return self.tx + + def require_no_tx(self): + if self.tx != None: + raise RuntimeError("Transaction already started") + + def reset_tx(self): + self.tx = None + + +STATE = State() + + +def ghidra_trace_connect(address=None): + """ + Connect Python to Ghidra for tracing + + Address must be of the form 'host:port' + """ + + STATE.require_no_client() + if address is None: + raise RuntimeError( + "'ghidra_trace_connect': missing required argument 'address'") + + parts = address.split(':') + if len(parts) != 2: + raise RuntimeError("address must be in the form 'host:port'") + host, port = parts + try: + c = socket.socket() + c.connect((host, int(port))) + # TODO: Can we get version info from the DLL? + STATE.client = Client(c, "drgn", methods.REGISTRY) + print(f"Connected to {STATE.client.description} at {address}") + except ValueError: + raise RuntimeError("port must be numeric") + + +def ghidra_trace_listen(address='0.0.0.0:0'): + """ + Listen for Ghidra to connect for tracing + + Takes an optional address for the host and port on which to listen. Either + the form 'host:port' or just 'port'. If omitted, it will bind to an + ephemeral port on all interfaces. If only the port is given, it will bind to + that port on all interfaces. This command will block until the connection is + established. + """ + + STATE.require_no_client() + parts = address.split(':') + if len(parts) == 1: + host, port = '0.0.0.0', parts[0] + elif len(parts) == 2: + host, port = parts + else: + raise RuntimeError("address must be 'port' or 'host:port'") + + try: + s = socket.socket() + s.bind((host, int(port))) + host, port = s.getsockname() + s.listen(1) + print("Listening at {}:{}...".format(host, port)) + c, (chost, cport) = s.accept() + s.close() + print("Connection from {}:{}".format(chost, cport)) + STATE.client = Client(c, "dbgeng.dll", methods.REGISTRY) + except ValueError: + raise RuntimeError("port must be numeric") + + +def ghidra_trace_disconnect(): + """Disconnect Python from Ghidra for tracing""" + + STATE.require_client().close() + STATE.reset_client() + + +def start_trace(name): + language, compiler = arch.compute_ghidra_lcsp() + if name is None: + name = 'drgn/noname' + STATE.trace = STATE.client.create_trace(name, language, compiler) + # TODO: Is adding an attribute like this recommended in Python? + STATE.trace.memory_mapper = arch.compute_memory_mapper(language) + STATE.trace.register_mapper = arch.compute_register_mapper(language) + + parent = os.path.dirname(inspect.getfile(inspect.currentframe())) + schema_fn = os.path.join(parent, 'schema.xml') + with open(schema_fn, 'r') as schema_file: + schema_xml = schema_file.read() + with STATE.trace.open_tx("Create Root Object"): + root = STATE.trace.create_root_object(schema_xml, 'DrgnRoot') + root.set_value('_display', 'drgn version ' + util.DRGN_VERSION.full) + util.set_convenience_variable('_ghidra_tracing', "true") + + +def ghidra_trace_start(name=None): + """Start a Trace in Ghidra""" + + STATE.require_client() + STATE.require_no_trace() + start_trace(name) + + +def ghidra_trace_stop(): + """Stop the Trace in Ghidra""" + + STATE.require_trace().close() + STATE.reset_trace() + + +def ghidra_trace_restart(name=None): + """Restart or start the Trace in Ghidra""" + + STATE.require_client() + if STATE.trace != None: + STATE.trace.close() + STATE.reset_trace() + start_trace(name) + + + +def ghidra_trace_create(start_trace=True): + """ + Create a session. + """ + + global prog + prog = drgn.Program() + kind = os.getenv('OPT_TARGET_KIND') + if kind == "kernel": + prog.set_kernel() + elif kind == "coredump": + img = os.getenv('OPT_TARGET_IMG') + prog.set_core_dump(img) + if '/' in img: + img = img[img.rindex('/')+1:] + else: + pid = int(os.getenv('OPT_TARGET_PID')) + prog.set_pid(pid) + util.selected_pid = pid + + default_symbols = {"default": True, "main": True} + try: + prog.load_debug_info(None, **default_symbols) + except drgn.MissingDebugInfoError as e: + print(e) + + if kind == "kernel": + img = prog.main_module().name + util.selected_tid = next(prog.threads()).tid + elif kind == "coredump": + util.selected_tid = prog.crashed_thread().tid + else: + img = prog.main_module().name + util.selected_tid = prog.main_thread().tid + + if start_trace: + ghidra_trace_start(img) + + PROGRAMS[util.selected_pid] = prog + + +def ghidra_trace_info(): + """Get info about the Ghidra connection""" + + if STATE.client is None: + print("Not connected to Ghidra") + return + host, port = STATE.client.s.getpeername() + print(f"Connected to {STATE.client.description} at {host}:{port}") + if STATE.trace is None: + print("No trace") + return + print("Trace active") + + +def ghidra_trace_info_lcsp(): + """ + Get the selected Ghidra language-compiler-spec pair. + """ + + language, compiler = arch.compute_ghidra_lcsp() + print("Selected Ghidra language: {}".format(language)) + print("Selected Ghidra compiler: {}".format(compiler)) + + +def ghidra_trace_txstart(description="tx"): + """ + Start a transaction on the trace + """ + + STATE.require_no_tx() + STATE.tx = STATE.require_trace().start_tx(description, undoable=False) + + +def ghidra_trace_txcommit(): + """ + Commit the current transaction + """ + + STATE.require_tx().commit() + STATE.reset_tx() + + +def ghidra_trace_txabort(): + """ + Abort the current transaction + + Use only in emergencies. + """ + + tx = STATE.require_tx() + print("Aborting trace transaction!") + tx.abort() + STATE.reset_tx() + + +@contextmanager +def open_tracked_tx(description): + with STATE.require_trace().open_tx(description) as tx: + STATE.tx = tx + yield tx + STATE.reset_tx() + + +def ghidra_trace_save(): + """ + Save the current trace + """ + + STATE.require_trace().save() + + +def ghidra_trace_new_snap(description=None): + """ + Create a new snapshot + + Subsequent modifications to machine state will affect the new snapshot. + """ + + description = str(description) + STATE.require_tx() + return {'snap': STATE.require_trace().snapshot(description)} + + +def ghidra_trace_set_snap(snap=None): + """ + Go to a snapshot + + Subsequent modifications to machine state will affect the given snapshot. + """ + + STATE.require_trace().set_snap(int(snap)) + + +def quantize_pages(start, end): + return (start // PAGE_SIZE * PAGE_SIZE, (end + PAGE_SIZE - 1) // PAGE_SIZE * PAGE_SIZE) + + + +def put_bytes(start, end, pages, display_result): + trace = STATE.require_trace() + if pages: + start, end = quantize_pages(start, end) + nproc = util.selected_process() + if end - start <= 0: + return {'count': 0} + try: + buf = prog.read(start, end - start) + except Exception as e: + return {'count': 0} + + count = 0 + if buf != None: + base, addr = trace.memory_mapper.map(nproc, start) + if base != addr.space: + trace.create_overlay_space(base, addr.space) + count = trace.put_bytes(addr, buf) + if display_result: + print("Wrote {} bytes".format(count)) + return {'count': count} + + +def eval_address(address): + try: + nproc = util.selected_process() + trace = STATE.require_trace() + base, addr = trace.memory_mapper.map(nproc, address) + if base != addr.space: + trace.create_overlay_space(base, addr.space) + return addr + except Exception: + raise RuntimeError("Cannot convert '{}' to address".format(address)) + + +def eval_range(address, length): + start = address + try: + end = start + length + except Exception as e: + raise RuntimeError("Cannot convert '{}' to length".format(length)) + return start, end + + +def putmem(address, length, pages=True, display_result=True): + start, end = eval_range(address, length) + return put_bytes(start, end, pages, display_result) + + +def ghidra_trace_putmem(address, length, pages=True): + """ + Record the given block of memory into the Ghidra trace. + """ + + STATE.require_tx() + return putmem(address, length, pages, True) + + +def putmem_state(address, length, state, pages=True): + STATE.trace.validate_state(state) + start, end = eval_range(address, length) + if pages: + start, end = quantize_pages(start, end) + nproc = util.selected_process() + base, addr = STATE.trace.memory_mapper.map(nproc, start) + if base != addr.space and state != 'unknown': + STATE.trace.create_overlay_space(base, addr.space) + STATE.trace.set_memory_state(addr.extend(end - start), state) + + +def ghidra_trace_putmem_state(address, length, state, pages=True): + """ + Set the state of the given range of memory in the Ghidra trace. + """ + + STATE.require_tx() + return putmem_state(address, length, state, pages) + + +def ghidra_trace_delmem(address, length): + """ + Delete the given range of memory from the Ghidra trace. + + Why would you do this? Keep in mind putmem quantizes to full pages by + default, usually to take advantage of spatial locality. This command does + not quantize. You must do that yourself, if necessary. + """ + + STATE.require_tx() + start, end = eval_range(address, length) + nproc = util.selected_process() + base, addr = STATE.trace.memory_mapper.map(nproc, start) + # Do not create the space. We're deleting stuff. + STATE.trace.delete_bytes(addr.extend(end - start)) + + +def putreg(): + nproc = util.selected_process() + if nproc < 0: + return + nthrd = util.selected_thread() + if nthrd < 0: + return + nframe = util.selected_frame() + if nframe < 0: + return + space = REGS_PATTERN.format(procnum=nproc, tnum=nthrd, level=nframe) + STATE.trace.create_overlay_space('register', space) + robj = STATE.trace.create_object(space) + robj.insert() + mapper = STATE.trace.register_mapper + + thread = prog.thread(nthrd) + try: + frames = thread.stack_trace() + except Exception as e: + print(e) + return + + regs = frames[nframe].registers() + endian = arch.get_endian() + sz = int(int(arch.get_size())/8) + values = [] + for key in regs.keys(): + try: + value = regs[key] + except Exception: + value = 0 + try: + rv = value.to_bytes(sz, endian) + values.append(mapper.map_value(nproc, key, rv)) + robj.set_value(key, hex(value)) + except Exception: + pass + return {'missing': STATE.trace.put_registers(space, values)} + + +def ghidra_trace_putreg(): + """ + Record the given register group for the current frame into the Ghidra trace. + + If no group is specified, 'all' is assumed. + """ + + STATE.require_tx() + putreg() + + +def ghidra_trace_delreg(): + """ + Delete the given register group for the curent frame from the Ghidra trace. + + Why would you do this? If no group is specified, 'all' is assumed. + """ + + STATE.require_tx() + nproc = util.selected_process() + nthrd = util.selected_thread() + if nthrd < 0: + return + nframe = util.selected_frame() + if nframe < 0: + return + space = REGS_PATTERN.format(procnum=nproc, tnum=nthrd, level=nframe) + + thread = prog.thread(nthrd) + try: + frames = thread.stack_trace() + except Exception as e: + print(e) + return + + regs = frames[nframe].registers() + names = [] + for key in regs.keys(): + names.append(key) + STATE.trace.delete_registers(space, names) + + +def put_object(lpath, key, value): + nproc = util.selected_process() + lobj = STATE.trace.create_object(lpath+"."+key) + lobj.insert() + if hasattr(value, "type_"): + vtype = value.type_ + vkind = vtype.kind + lobj.set_value('_display', '{} [{}]'.format(key, vtype.type_name())) + lobj.set_value('Kind', str(vkind)) + lobj.set_value('Type', str(vtype)) + else: + lobj.set_value('_display', '{} [{}:{}]'.format(key, type(value), str(value))) + lobj.set_value('Value', str(value)) + return + + if hasattr(value, "absent_"): + if value.absent_: + lobj.set_value('Value', 'The following launchers uses Meta's drgn engine to explore various targets:
+ +This launcher attaches to a running process via the Linux "/proc/pid" interface.
+ +You must have Meta's drgn installed on the local system. The default behavior + assumes you do NOT need root access to attach to a running process, i.e. it assumes you + have run the command:
+ ++echo 0 > /proc/sys/kernel/yama/ptrace_scope ++
using root privileges at some point. Alternately, you can prepend "sudo -E" + to the drgn invocation line in "local-drgn.sh"". Note: drgn does not currently + support stack unwinding or register access for user-mode access to running processes. +
+ +This launcher loads a Linux core dump.
+ +You must have Meta's drgn installed on the local system. No other setup is required. + Note: Core dumps may or may not include memory, so the Dynamic Listing may or may not be populated. +
+ +This launcher attaches to a Linux kernel via the "/proc/kcore" interface.
+ +You must have Meta's drgn installed on the local system. No other setup is required. + Note: requires root access - you will be prompted for a password in the Terminal. +
+ +We currently provide one launcher for Trace RMI API exploration and development:
diff --git a/Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/AbstractDrgnTraceRmiTest.java b/Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/AbstractDrgnTraceRmiTest.java new file mode 100644 index 00000000000..befdaee92d7 --- /dev/null +++ b/Ghidra/Test/DebuggerIntegrationTest/src/test.slow/java/agent/drgn/rmi/AbstractDrgnTraceRmiTest.java @@ -0,0 +1,379 @@ +/* ### + * IP: GHIDRA + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package agent.drgn.rmi; + +import static org.junit.Assert.*; +import static org.junit.Assume.*; + +import java.io.FileWriter; +import java.io.IOException; +import java.net.*; +import java.nio.file.*; +import java.util.Map; +import java.util.Objects; +import java.util.concurrent.*; +import java.util.concurrent.atomic.AtomicReference; +import java.util.function.*; + +import org.apache.commons.lang3.exception.ExceptionUtils; +import org.junit.Before; + +import generic.jar.ResourceFile; +import ghidra.app.plugin.core.debug.gui.AbstractGhidraHeadedDebuggerTest; +import ghidra.app.plugin.core.debug.service.tracermi.TraceRmiPlugin; +import ghidra.app.plugin.core.debug.utils.ManagedDomainObject; +import ghidra.app.services.TraceRmiService; +import ghidra.debug.api.tracermi.*; +import ghidra.framework.*; +import ghidra.framework.main.ApplicationLevelOnlyPlugin; +import ghidra.framework.model.DomainFile; +import ghidra.framework.plugintool.Plugin; +import ghidra.framework.plugintool.PluginsConfiguration; +import ghidra.framework.plugintool.util.*; +import ghidra.pty.testutil.DummyProc; +import ghidra.util.Msg; +import junit.framework.AssertionFailedError; + +public abstract class AbstractDrgnTraceRmiTest extends AbstractGhidraHeadedDebuggerTest { + + protected static String CORE = "core.12137"; + protected static String MDO = "/New Traces/" + CORE; + public static String PREAMBLE = """ + import os + import drgn + import drgn.cli + os.environ['OPT_TARGET_KIND'] = 'coredump' + os.environ['OPT_TARGET_IMG'] = '$CORE' + from ghidradrgn.commands import * + """; + + // Connecting should be the first thing the script does, so use a tight timeout. + protected static final int CONNECT_TIMEOUT_MS = 3000; + protected static final int TIMEOUT_SECONDS = 30000; + protected static final int QUIT_TIMEOUT_MS = 1000; + + protected static boolean didSetupPython = false; + + protected TraceRmiService traceRmi; + private Path pythonPath; + private Path outFile; + private Path errFile; + + @Before + public void assertOS() { + assumeTrue(OperatingSystem.CURRENT_OPERATING_SYSTEM == OperatingSystem.LINUX); + } + + //@BeforeClass + public static void setupPython() throws Throwable { + if (didSetupPython) { + // Only do this once when running the full suite. + return; + } + String gradle = DummyProc.which("gradle"); + new ProcessBuilder(gradle, "Debugger-agent-drgn:assemblePyPackage") + .directory(TestApplicationUtils.getInstallationDirectory()) + .inheritIO() + .start() + .waitFor(); + didSetupPython = true; + } + + protected void setPythonPath(ProcessBuilder pb) throws IOException { + String sep = + OperatingSystem.CURRENT_OPERATING_SYSTEM == OperatingSystem.LINUX ? ";" : ":"; + String rmiPyPkg = Application.getModuleSubDirectory("Debugger-rmi-trace", + "build/pypkg/src").getAbsolutePath(); + String drgnPyPkg = Application.getModuleSubDirectory("Debugger-agent-drgn", + "build/pypkg/src").getAbsolutePath(); + String add = rmiPyPkg + sep + drgnPyPkg; + pb.environment().compute("PYTHONPATH", (k, v) -> v == null ? add : (v + sep + add)); + } + + @Before + public void setupTraceRmi() throws Throwable { + traceRmi = addPlugin(tool, TraceRmiPlugin.class); + + try { + pythonPath = Paths.get(DummyProc.which("drgn")); + } + catch (RuntimeException e) { + Msg.error(this, e); + } + outFile = Files.createTempFile("drgnout", null); + errFile = Files.createTempFile("drgnerr", null); + } + + protected void addAllDebuggerPlugins() throws PluginException { + PluginsConfiguration plugConf = new PluginsConfiguration() { + @Override + protected boolean accepts(Class extends Plugin> pluginClass) { + return !ApplicationLevelOnlyPlugin.class.isAssignableFrom(pluginClass); + } + }; + + for (PluginDescription pd : plugConf + .getPluginDescriptions(PluginPackage.getPluginPackage("Debugger"))) { + addPlugin(tool, pd.getPluginClass()); + } + } + + protected static String addrToStringForPython(InetAddress address) { + if (address.isAnyLocalAddress()) { + return "127.0.0.1"; // Can't connect to 0.0.0.0 as such. Choose localhost. + } + return address.getHostAddress(); + } + + protected static String sockToStringForPython(SocketAddress address) { + if (address instanceof InetSocketAddress tcp) { + return addrToStringForPython(tcp.getAddress()) + ":" + tcp.getPort(); + } + throw new AssertionError("Unhandled address type " + address); + } + + protected record PythonResult(boolean timedOut, int exitCode, String stdout, String stderr) { + protected String handle() { + if (stderr.contains("RuntimeError") || stderr.contains(" Error") || (0 != exitCode && 1 != exitCode && 143 != exitCode)) { + throw new PythonError(exitCode, stdout, stderr); + } + System.out.println("--stdout--"); + System.out.println(stdout); + System.out.println("--stderr--"); + System.out.println(stderr); + return stdout; + } + } + + protected record ExecInDrgn(Process python, CompletableFutureThis module walks you through an example of how to add a debugger agent to Ghidra. It has no exercises and is certainly not the only way to implement an agent, but hopefully contains some useful pointers and highlights some pit-falls that you might encounter. The example traces the implementation of an actual agent — the agent for Meta’s drgn debugger, which provides a scriptable, albeit read-only, interface to the running Linux kernel, as well as user-mode and core-dump targets.
+To support debugging on various platforms, the Ghidra debugger has agents, i.e. clients capable of receiving information from a native debugger and passing it to the Ghidra GUI. They include the dbgeng agent that supports Windows debuggers, the gdb agent for gdb on a variery of platforms, the lldb agent for macOS and Linux, and the jpda agent for Java. All but the last are written in Python 3, and all communicate with the GUI via a protobuf-based protocol described in Debugger-rmi-trace.
+At the highest level, each agent has four elements (ok, a somewhat arbitrary division, but…):
+debugger-launchers
– A set of launchers, often a mixture of .bat
,.sh
, and sometime .py
scriptsschema.xml
– An object-model schema. (While expressed in XML, this is not an “XML schema”.)src/ghidradrgn
– Python files for architecture, commands, hooks, methods, and common utility functionsbuild.gradle
– Build logicLarge portions of each are identical or similar across agents, so, as a general strategy, copying an existing agent and renaming all agent-specific variables, methods, etc. is not the worst plan of action. Typically, this leads to large chunks of detritus that need to be edited out late in the development process.
+local-drgn.sh
The initial objective is to create a shell that sets up the environment variables for parameters we’ll need and invokes the target. For this project, I originally started duplicating the lldb agent and then switched to the dbgeng agent. Why? The hardest part of writing an agent is getting the initial launch pattern correct. drgn is itself written in Python. While gdb and lldb support Python as scripting languages, their cores are not Python-based. For these debuggers, the launcher runs the native debugger and instructs it to load our plugin, which is the agent. The dbgeng agent inverts this pattern, i.e. the agent is a Python application that uses the Pybag package to access the native kd interface over COM. drgn follows this pattern.
+That said, a quick look at the launchers in the dbgeng project (under debugger-launchers
) shows .bat
files, each of which calls a .py
file in data/support
. As drgn is a Linux-only debugger, we need to convert the .bat
examples to .sh
. Luckily, the conversion is pretty simple: most line annotations use #
in place of ::
and environment variables are referenced using $VAR
in place of %VAR%
.
The syntax of the .sh
is typical of any *nix shell. In addition to the shell script, a launcher include a metadata header to populate its menu and options dialog. Annotations include:
#!
line for the shell invocation#@title
line for the launcher name#@desc
-annotated HTML description, as displayed in the launch dialog#@menu-group
for organizing launchers#@icon
for an icon#@help
the help file and anchor#@arg
variables, usually only one to name the executable image#@args
specifies the remainder of the arguments, passed to a user-mode target if applicable#@env
variables referenced by the Python codeWhile the drgn launcher does not use @arg
or @args
, there are plentiful examples in the gdb project. The #@env
lines are composed of the variable name (usually in caps), its type, default value, a label for the dialog if the user need to be queried, and a description. The syntax looks like:
#@env
Name :
Type [ !
] =
DefaultValue Label Descriptionwhere !
, if present, indicates the option is required.
For drgn, invoking the drgn
command directly saves us a lot of the work involved in getting the environment correct. We pass it our Python launcher local-drgn.py
instead of allowing it to call run_interactive
, which does not return. Instead, we created an instance of prog
based on the parameters, complete the Ghidra-specific initialization, and call run_interactive(prog)
ourselves.
The Python script needs to do the setup work for Ghidra and for drgn. A good start is to try to implement a script that calls the methods for connect
, create
, and start
, with create
doing as little as possible initially. This should allow you to work the kinks out of arch.py
and util.py
.
For this particular target, there are some interesting wrinkles surrounding the use of sudo
(required for most targets) which complicate where wheels are installed (i.e. it is pretty easy to accidentally mix user-local and system site-packages
). Additionally, the -E
parameter is required to ensure that the environment variable we defined get passed to the root environment. In the cases where we use sudo
, the first message printed in the interactive shell will be the request for the user’s password.
The schema, specified in schema.xml
, provides a basic structure for Ghidra’s Model View and allows Ghidra to identify and locate various interfaces that are used to populate the GUI. For example, the Memory interface identifies the container for items with the interface MemoryRegion, which provide information used to fill the Memory View. Among the important interfaces are Process, Thread, Frame, Register, MemoryRegion, Module, and Section. These interfaces are “built into” Ghidra so that it can identify which objects provide specific information and commands.
For the purposes of getting started, it’s easiest to clone the dbgeng schema and modify it as needed. Again, this will require substantial cleanup later on, but, as schema errors are frequently subtle and hard to identify, revisiting is probably the better approach. MANIFEST.in
should be modfied to reflect the schema’s path.
Similarly, build.gradle
can essentially be cloned from dbgeng, with the appropriate change to eclipse.project.name
. For the most part, you need only apply the distributableGhidraModule.gradle
and hasPythonPackage.gradle
scripts. If further customization is needed, consult other examples in the Ghidra project and Gradle’s documentation.
Not perhaps directly a build logic item, but pyproject.toml
should be modified to reflect the agent’s version number (by convention, Ghidra’s version number).
At this point, we can start actually implementing the drgn agent. arch.py
is usually a good starting point, as much of the initial logic depends on it. For arch.py
, the hard bit is knowing what maps to what. The language_map
converts the debugger’s self-reported architecture to Ghidra’s language set. Ghidra’s languages are mapped to a set of language-to-compiler maps, which are then used to map the debugger’s self-reported language to Ghidra’s compiler. Certain combinations are not allowed because Ghidra has no concept of that language-compiler combination. For example, x86 languages never map to default
. Hence, the need for a x86_compiler_map
, which defaults to something else (in this case, gcc
).
After arch.py
, a first pass at util.py
is probably warranted. In particular, the version info is used early in the startup process. A lot of this code is not relevant to our current project, but at a minimum we want to implement (or fake out) methods such as selected_process
, selected_thread
, and selected_frame
. In this example, there probably won’t be more than one session or one process. Ultimately, we’ll have to decide whether we even want Session in the schema. For now, we’re defaulting session and process to 0, and thread to 1, as 0 is invalid for debugging the kernel. (Later, it becomes obvious that the attached pid and prog.main_thread().tid
make sense for user-mode debugging, and prog.crashed_thread().tid
makes sense for crash dump debugging.)
With arch.py
and util.py
good to a first approximation, we would normally start implementing put
methods in commands.py
for various objects in the Model View, starting at the root of the tree and descending through the children. Again, Session and Process are rather poorly-defined, so we skip them (leaving one each) and tackle Threads. Typically, for each iterator in the debugger API, two commands get implemented — one internal method that does the actual work, e.g. put_threads()
and one invokable method that wraps this method in a (potentialy batched) transaction, e.g. ghidra_trace_put_threads()
. The internal methods are meant to be called by other Python code, with the caller assumed to be responsible for setting up the transaction. The ghidra_trace
-prefixed methods are meant to be part of the custom CLI command set which the user can invoke and therefore should set up the transaction. The internal method typically creates the path to the container using patterns for the container, individual keys, and the combination, e.g. THREADS_PATTERN
, THREAD_KEY_PATTERN
, and THREAD_PATTERN
. Patterns are built up from other patterns, going back to the root. A trace object corresponding to the debugger object is created from the path and inserted into the trace database.
Once this code has been tested, attributes of the object can be added to the base object using set_value
. Attributes that are not primitives can be added using the pattern create-populate-insert, i.e. we call create_object
with extensions to the path, populate the object’s children, and call insert
with the created object. In many cases (particularly when populating an object’s children is expensive), you may want to defer the populate step, effectively creating a placeholder that can be populated on-demand. The downside of this approach, of course, is that refresh methods must be added to populate those nodes.
As an aside, it’s probably worth noting the function of create_object
and insert
. Objects in the trace are maintained in a directory tree, with links (and backlinks) allowed, whose visible manifestation is the Model View. As such, operations on the tree follow the normal procedure for operations on a graph. create_object
creates a node but not any edges, not even the implied (“canonical”) edge from parent to child. insert
creates the canonical edge. Until that edge exists, the object is not considered to be “alive”, so the lifespan of the edge effectively encodes the object’s life. Following the create-populate-insert pattern, minimizes the number of events that need to be processed.
Having completed a single command, we can proceed in one of two directions — we can continue implementing commands for other objects in the tree, or we can implement matching refresh methods in methods.py
for the completed object. methods.py
also requires patterns which are used to match a path to a trace object, usually via find_x_by_pattern
methods. The refresh
methods may or may not rely on the find_by
methods depending on whether the matching command needs parameters. For example, we may want to assume the selected_thread
matches the current object in the view, in which case it can be used to locate that node, or we may want to force the method to match on the node if the trace object can be easily matched to the debugger object, or we may want to use the node to set selected_thread
.
The concept of focus in the debugger is fairly complicated and a frequent source of confusion. In general, we use selected to represent the GUI’s current focus, typically the node in the Model or associated views which the user has selected. In some sense, it represents the process, thread, or frame the user is interested in. It also may differ from the highlighted node, chosen by a single-click (versus a double-click which sets the selection). By contrast, the native debugger has its own idea of focus, which we usually describe as current. (This concept is itself complicated by distinctions between the event object, e.g. which thread the debugger broke on, and the current object, e.g. which thread is being inspected.) Current values are pushed “up” to Ghidra’s GUI from the native debugger; selected values are pushed “down” to the native debugger from Ghidra. To the extent possible, it makes sense to synchronize these values. In other words, in most cases, a new selection should force a change in the set of current objects, and an event signaling a change in the current object should alter the GUI’s set of selected objects. (Of course, care needs to be taken not to make this a round-trip cycle.)
+refresh
methods (and others) are often annotated in several ways. The @REGISTRY.method
annotation makes the method available to the GUI. It specifies the action
to be taken and the display
that appears in the GUI pop-up menu. Actions may be purely descriptive or may correspond to built-in actions taken by the GUI, e.g. refresh
and many of the control methods, such as step_into
. Parameters for the methods may be annotated with sch.Schema
(conventionally on the first parameter) to indicate the nodes to which the method applies, and with ParamDesc
to describe the parameter’s type and label for pop-up dialogs. After retrieving necessary parameters, refresh
methods invoke methods from commands.py
wrapped in a transaction.
For drgn, we implemented put
/refresh
methods for threads, frames, registers (putreg
), and local variables, then modules and sections, memory and regions, the environment, and finally processes. We also implemented putmem
using the drgn’s read
API. Symbols was another possibility, but, for the moment, populating symbols seemed to expensive. Instead, retrieve_symbols
was added to allow per-pattern symbols to be added. Unfortunately, the drgn API doesn’t support wildcards, so eventually some other strategy will be necessary.
The remaining set of Python functions, hooks.py
, comprises callbacks for various events sent by the native debugger. The current drgn code has no event system. A set of skeletal methods has been left in place as (a) we can use the single-step button as a stand-in for “update state”, and (b) some discussion exists in the drgn user forums regarding eventually implementing more control functionality. For anyone implementing hooks.py
, the challenging logic resides in the event loop, particularly if there is a need to move back-and-forth between the debugger and a repl. Also, distinctions need to be made between control commands, which wait for events, and commands which rely on a callback but complete immediately. As a rule-of-thumb, we push to Ghidra, i.e. Ghidra issue requests asynchronously and the agent must update the trace database.
At this point, revisiting and editing the schema may be called for. For example, for drgn, it’s not obvious that there can ever be more than one session, so it may be cleaner to embed Processes at the root. This, in turn, requires editing the commands.py
and methods.py
patterns. Similarly, as breakpoints are not supported, the breakpoint-related entries may safely be deleted.
In general, the schema can be structured however you like, but there are several details worth mentioning. Interfaces generally need to be respected for various functions in the GUI to work. Process, thread, frame, module, section, and memory elements can be named arbitrarily, but their interfaces must be named correctly. Additionally, the logic for finding objects in the tree is quite complicated. If elements need be traversed as part of the default search process, their containers must be tagged canonical
. If attributes need to be traversed, their parents should have the interface Aggregate
.
Each entry may have elements
of the same type ordered by keys, and attributes
of arbitrary type. The element
entry describes the schema for all elements; the schema for attributes may be given explicitly using named attribute
entries or defaulted using the unnamed attribute
entry, typically <attribute schema="VOID">
or <attribute schema="ANY">
. The schema for any element in the Model View is visible using the hover, which helps substantially when trying to identify schema traversal errors.
Schema entries may be marked hidden=yes
with the obvious result. Additionally, certain attribute names and schema have special properties. For example, _display
defines the visible ID for an entry in the Model tree, and ADDRESS
and RANGE
mark attributes which are navigable.
The hardest part of writing unit tests is almost always getting the first test to run, and the easiest unit tests, as with the Python files, are those for commands.py
. For drgn, as before, we’re using dbgeng as the pattern, but several elements had to be changed. Because the launchers execute a script, we need to amend the runThrowError
logic (and, more specifically, the execInPython
logic) in AbstractDrgnTraceRmiTest
with a ProcessBuilder
call that takes a script, rather than writing the script to stdin. While there, we can also trim out the unnecessary helper logic around items like breakpoints, watchpoints, etc. from all of the test classes.
JUnits for methods.py
follow a similar pattern, but, again, getting the first one to run is often the most difficult. For drgn, we’ve had to override the timeouts in waitForPass
and waitForCondition
. After starting with hardcoded paths for the test target, we also had to add logic to re-write the PREAMBLE
on-the-fly in execInDrgn
. Obviously, with no real hooks.py
logic, there’s no need for DrgnHooksTest
.
Of note, we’ve used the gdb gcore
command to create a core dump for the tests. Both user- and kernel-mode require privileges to run the debugger, and, for testing, that’s not ideal.
The principal piece of documentation for all new debuggers is a description of the launchers. Right now, the TraceRmiLauncherServicePlugin.html
file in Debug/Debugger-rmi-trace
contains all of this information. Detail to note: the #@help
locations in the launchers themselves ought to match the HTML tags in the file, as should the launcher names.
Once everything else is done, it may be worth considering additional functionality specific to the debugger. This can be made available in either commands.py
or methods.py
. For drgn, we’ve added attach
methods that allow the user to attach to additional programs.