-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #12 from turtQb/cfg-diffing
Add tutorial on graph diffing
- Loading branch information
Showing
12 changed files
with
272 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,208 @@ | ||
Diffing Graphs only | ||
=================== | ||
|
||
QBinDiff is able to perform graph matching, whether this graph is a Control Flow Graph (CFG), | ||
a Call Graph (CG), or something completely unrelated. | ||
We demonstrated this in our `introductory blog post <https://blog.quarkslab.com/qbindiff-a-modular-diffing-toolkit.html>`_ | ||
with the protein-protein interaction (PPI) networks of different species in bioinformatics. | ||
|
||
In this tutorial, we will focus our attention on the CFG of EVM smart contracts. | ||
|
||
Motivation | ||
---------- | ||
|
||
The Ethereum Virtual Machine (EVM) is a RISC stack-based architecture. | ||
It is used by Ethereum and other compatible chains to execute smart contracts, | ||
the core programs of decentralized applications on these platforms. | ||
|
||
In addition to the lack of native support from tools such as QBinDiff, BinExport, and Quokka, | ||
the EVM bytecode lacks an explicit structure to identify functions: | ||
all the control flow is performed with only conditional and unconditional jumps (``JUMPI``, ``JUMP``), | ||
that read the destination address on the stack. | ||
This design leads to particular patterns in the bytecode, for example: | ||
|
||
* the start of the bytecode contains a dispatcher, acting like a giant switch statement, | ||
that is responsible for jumping into the called external/public function; | ||
* to call internal functions and other forms of duplicated code, | ||
the caller pushes a return address onto the stack before executing a jump instruction. | ||
|
||
However, several tools let you recover the CFG of the smart contract | ||
(some even detect functions with varying degrees of success). | ||
From this information alone, this tutorial will guide you with diffing an example smart contract. | ||
|
||
How to diff | ||
----------- | ||
|
||
In this tutorial, we will diff two versions of an example smart contract. | ||
One allows its user's balance to go negative but not the other. | ||
|
||
+---------------------------------------+----------------------------------+ | ||
| ``vulnerable.sol`` | ``fixed.sol`` | | ||
+=======================================+==================================+ | ||
|.. literalinclude:: res/vulnerable.sol |.. literalinclude:: res/fixed.sol | | ||
+---------------------------------------+----------------------------------+ | ||
|.. literalinclude:: res/vulnerable.evm |.. literalinclude:: res/fixed.evm | | ||
+---------------------------------------+----------------------------------+ | ||
|
||
To this end, we will use QBinDiff's ``DiGraphDiffer`` to compare their CFG. | ||
|
||
Generate the graphs | ||
^^^^^^^^^^^^^^^^^^^ | ||
|
||
The first step is to recover the CFG from the bytecode of the smart contracts. | ||
While not trivial, this has been solved by several tools already. | ||
We recommend `EtherSolve <https://github.com/SeUniVr/EtherSolve/>`_ (java) | ||
or `vandal <https://github.com/usyd-blockchain/vandal/>`_ (python). | ||
|
||
.. code-block:: sh | ||
java -jar EtherSolve.jar -r -j vulnerable.evm | ||
The command above generates a file named ``Analysis_<datetime>.json``. | ||
In this file, the CFG can be found under the ``runtimeCfg`` field. | ||
Note that edges are stored under ``runtimeCfg.successors`` (and sometimes have duplicate entries). | ||
|
||
QBinDiff's differ expects ``networkx.DiGraph`` as inputs, so we will need to adapt the data a little bit: | ||
|
||
.. code-block:: python | ||
def process_ethersolve(analysis: dict[str, Any]) -> networkx.DiGraph: | ||
# Filter the desired attributes and set the node ID as the basic block's offset | ||
nodes = [ | ||
{ | ||
"id": n["offset"], | ||
"length": n["length"], | ||
"type": n["type"], | ||
"stack_balance": n["stackBalance"], | ||
"bytecode": n["bytecodeHex"], | ||
"opcodes": n["parsedOpcodes"], | ||
} for n in analysis["runtimeCfg"]["nodes"] | ||
] | ||
# Generate an edge for each successor of each node | ||
links = [ | ||
[{"source": e["from"], "target": t} for t in e["to"]] | ||
for e in analysis["runtimeCfg"]["successors"] | ||
] | ||
# Flatten the list | ||
links = [item for sublist in links for item in sublist] | ||
# Create the networkx.DiGraph | ||
nx_node_link = {"directed": True, "multigraph": False, "nodes": nodes, "links": links} | ||
return networkx.node_link_graph(nx_node_link) | ||
(optional) Write heuristics | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
When matching nodes, we can help the differ by setting an initial similarity score between each pair of nodes. | ||
These scores are gathered in the similarity matrix, initialized to all 1, | ||
meaning every node is initially believed to be similar to all nodes. | ||
|
||
If you have access to some heuristic for similarity between nodes, | ||
you can add a prepass that will be executed before matching nodes to alter the similarity matrix. | ||
|
||
For example, we have access to the stack balance of each basic block. | ||
This value indicates how many words the basic block pushes or pops from the stack. | ||
Intuitively, similar blocks should have the same stack balance. | ||
|
||
You can find more information `here <../qbindiff/doc/source/api/differ.html#qbindiff.Differ.register_prepass>`_ | ||
on how to create a prepass. | ||
|
||
In this example, we first arrange nodes by stack balance in each graph, | ||
then reduce the similarity of nodes that do not share the same stack balance. | ||
|
||
Note that the nodes' IDs are their offset, and do not correspond to the row or column in the similarity matrix. | ||
The correspondance is given by the ``primary_n2i`` and ``secondary_n2i`` mappings. | ||
|
||
.. code-block:: python | ||
def prepass_stack_balance( | ||
sim_matrix: SimMatrix, | ||
primary: qbindiff.GenericGraph, | ||
secondary: qbindiff.GenericGraph, | ||
primary_n2i: dict[int, int], | ||
secondary_n2i: dict[int, int], | ||
**kwargs, | ||
) -> None: | ||
# Arrange nodes indices by stack balance | ||
## Primary node indices by stack balance | ||
primary_index: dict[int, list[int]] = {} | ||
## Secondary node indices by stack balance | ||
secondary_index: dict[int, list[int]] = {} | ||
## Populate primary_index and secondary_index | ||
for graph, n2i, index in ( | ||
(primary, primary_n2i, primary_index), | ||
(secondary, secondary_n2i, secondary_index), | ||
): | ||
for node_id in graph.nodes(): | ||
node = graph.nodes[node_id] | ||
balance = node["stack_balance"] | ||
if balance not in index: | ||
index[balance] = [] | ||
index[balance].append(n2i[node_id]) | ||
# Reduce the similarity of nodes that do not share the same stack balance by 60% | ||
for primary_balance, primary_indices in primary_index.items(): | ||
for secondary_balance, secondary_indices in secondary_index.items(): | ||
if primary_balance == secondary_balance: | ||
continue | ||
for i in primary_indices: | ||
sim_matrix[i, secondary_indices] *= 0.4 | ||
Perform the match | ||
^^^^^^^^^^^^^^^^^ | ||
|
||
Once you have the CFG in a ``networkx.DiGraph`` object, | ||
and have optionally written some prepasses, performing the mapping is simple: | ||
|
||
.. code-block:: python | ||
differ = qbindiff.DiGraphDiffer( | ||
primary_cfg, | ||
secondary_cfg, | ||
sparsity_ratio=0, | ||
tradeoff=0.5, | ||
epsilon=0.1, | ||
) | ||
differ.register_prepass(prepass_stack_balance) # optional | ||
mapping = differ.compute_matching() | ||
You can experiment with the tradeoff and epsilon values, | ||
depending on the nature of the diffing performed. | ||
As general guidelines: | ||
|
||
* ``tradeoff`` gives more weight to the topology when close to 0, | ||
and more weight to the similarity when close to 1. | ||
It should be set strictly between 0 and 1. | ||
The better your heuristics, the higher its value. | ||
* ``epsilon`` controls the convergence speed. | ||
It should not be set to 0, and be as close to 1 as you can afford to wait. | ||
For this simple example, a conservative low value is not an issue. | ||
* you should adjust how much your prepasses affect the similarity matrix, | ||
depending on the quality of your heuristics. | ||
|
||
For this example, we performed an exhaustive search of these parameters, | ||
when compared to a ground truth matching. | ||
In these maps, ``epsilon`` and ``tradeoff`` correspond to the above parameters, | ||
while ``stack balance weight`` controls how much the stack balance prepass impacts the similarity matrix. | ||
|
||
+-------------------------------------------------+--------------------------------------------------+-------------------------------------+ | ||
| .. image:: res/stack-balance-weight-epsilon.png | .. image:: res/stack-balance-weight-tradeoff.png | .. image:: res/tradeoff-epsilon.png | | ||
+-------------------------------------------------+--------------------------------------------------+-------------------------------------+ | ||
|
||
Process the result | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Now that you have a mapping between nodes of the primary and secondary graphs, | ||
you can process it however you like, for example to compute similarity score. | ||
|
||
Here we show a visualization of the resulting diff, revealing interesting aspects of the modification: | ||
|
||
+------------------------------------+-----------------------------+------------------------------------+ | ||
| From signed to unsigned operations | CFG rewiring | Dispatcher update | | ||
+====================================+=============================+====================================+ | ||
| .. image:: res/diff-signedness.png | .. image:: res/diff-cfg.png | .. image:: res/diff-dispatcher.png | | ||
+------------------------------------+-----------------------------+------------------------------------+ |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
608060405234801561001057600080fd5b50600436106100365760003560 | ||
e01c806327e235e31461003b578063412664ae1461006d575b600080fd5b | ||
61005b6100493660046100f3565b60006020819052908152604090205481 | ||
565b60405190815260200160405180910390f35b61008061007b36600461 | ||
0115565b610082565b005b33600090815260208190526040812080548392 | ||
906100a1908490610155565b90915550506001600160a01b038216600090 | ||
815260208190526040812080548392906100ce90849061016e565b909155 | ||
50505050565b80356001600160a01b03811681146100ee57600080fd5b91 | ||
9050565b60006020828403121561010557600080fd5b61010e826100d756 | ||
5b9392505050565b6000806040838503121561012857600080fd5b610131 | ||
836100d7565b946020939093013593505050565b634e487b7160e01b6000 | ||
52601160045260246000fd5b818103818111156101685761016861013f56 | ||
5b92915050565b808201808211156101685761016861013f56fea2646970 | ||
66735822122024ea090b39258ab97fbe93f3646eb22d1d2cc2aedc4b74e1 | ||
c3861562ad940d0364736f6c63430008190033 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
// SPDX-License-Identifier: MIT | ||
|
||
pragma solidity 0.8.25; | ||
|
||
contract SimpleToken { | ||
mapping(address => uint) public balances; | ||
|
||
constructor() { | ||
balances[msg.sender] += 1000e18; | ||
} | ||
|
||
function sendToken(address _recipient, uint _amount) public { | ||
balances[msg.sender] -= _amount; | ||
balances[_recipient] += _amount; | ||
} | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
608060405234801561001057600080fd5b50600436106100365760003560 | ||
e01c80630f7428b91461003b57806327e235e314610050575b600080fd5b | ||
61004e6100493660046100f3565b610082565b005b61007061005e366004 | ||
61011d565b60006020819052908152604090205481565b60405190815260 | ||
200160405180910390f35b33600090815260208190526040812080548392 | ||
906100a1908490610155565b90915550506001600160a01b038216600090 | ||
815260208190526040812080548392906100ce90849061017c565b909155 | ||
50505050565b80356001600160a01b03811681146100ee57600080fd5b91 | ||
9050565b6000806040838503121561010657600080fd5b61010f836100d7 | ||
565b946020939093013593505050565b60006020828403121561012f5760 | ||
0080fd5b610138826100d7565b9392505050565b634e487b7160e01b6000 | ||
52601160045260246000fd5b818103600083128015838313168383128216 | ||
17156101755761017561013f565b5092915050565b808201828112600083 | ||
128015821682158216171561019c5761019c61013f565b50509291505056 | ||
fea264697066735822122078995ea6ebd8bd5d1333cde4da1d4ade3eee62 | ||
af048e42232dbc3ed6cd682aba64736f6c63430008190033 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
// SPDX-License-Identifier: MIT | ||
|
||
pragma solidity 0.8.25; | ||
|
||
contract SimpleToken { | ||
mapping(address => int) public balances; | ||
|
||
constructor() { | ||
balances[msg.sender] += 1000e18; | ||
} | ||
|
||
function sendToken(address _recipient, int _amount) public { | ||
balances[msg.sender] -= _amount; | ||
balances[_recipient] += _amount; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters