Merge pull request #83 from trailofbits/readme-variant

More usability and doc improvements
trailofbits · Dec 23, 2023 · 16aa3bd · 16aa3bd
2 parents 08f98f2 + ba83176
commit 16aa3bd
Show file tree

Hide file tree

Showing 12 changed files with 208 additions and 74 deletions.
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -13,17 +13,11 @@ jobs:
       language: "python"
       python-version: "3.8"
 
-  lint-markdown:
-    uses: trailofbits/.github/.github/workflows/[email protected]
-    with:
-      language: "markdown"
-
   all-lints-pass:
     if: always()
 
     needs:
       - lint-python
-      - lint-markdown
 
     runs-on: ubuntu-latest
 

diff --git a/README.md b/README.md
@@ -1,40 +1,150 @@
 # Fickling
 
+![Fickling image](./fickling_image.png)
+
 Fickling is a decompiler, static analyzer, and bytecode rewriter for Python
 [pickle](https://docs.python.org/3/library/pickle.html) object serializations.
+You can use fickling to detect, analyze, reverse engineer, or even create
+malicious pickle or pickle-based files, including PyTorch files.
 
-Pickled Python objects are in fact bytecode that is interpreted by a stack-based
-virtual machine built into Python called the "Pickle Machine". Fickling can take
-pickled data streams and decompile them into human-readable Python code that,
-when executed, will deserialize to the original serialized object.
-
-The authors do not prescribe any meaning to the “F” in Fickling; it could stand
-for “fickle,” … or something else. Divining its meaning is a personal journey
-in discretion and is left as an exercise to the reader.
+Fickling can be used both as a **python library** and a **CLI**.
 
-Learn more about it in our [blog post](https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/)
-and [DEF CON AI Village 2021 talk](https://www.youtube.com/watch?v=bZ0m_H_dEJI).
+* [Installation](#installation)
+* [Malicious file detection](#malicious-file-detection)
+* [Advanced usage](#advanced-usage)
+  * [Trace pickle execution](#trace-pickle-execution)
+  * [Pickle code injection](#pickle-code-injection)
+  * [Pickle decompilation](#pickle-decompilation)
+  * [PyTorch polyglots](#pytorch-polyglots)
+* [About pickle](#about-pickle)
+* [Contact](#contact)
 
 ## Installation
 
 Fickling has been tested on Python 3.8 through Python 3.11 and has very few dependencies.
-It can be installed through pip:
+Both the library and command line utility can be installed through pip:
 
 ```bash
 python -m pip install fickling
 ```
 
-This installs both the library and the command line utility.
+## Malicious file detection
+
+Fickling can seamlessly be integrated into your codebase to detect and halt the loading of malicious
+files at runtime.
+
+Below we show the different ways you can use fickling to enforce safety checks on pickle files.
+Under the hood, it hooks the `pickle` library to add safety checks so that loading a pickle file
+raises an `UnsafeFileError` exception if malicious content is detected in the file.
+
+#### Option 1 (recommended): check safety of all pickle files loaded
+
+```python
+# This enforces safety checks every time pickle.load() is used
+fickling.always_check_safety()
+
+# Attempt to load an unsafe file now raises an exception  
+with open("file.pkl", "rb") as f:
+    try:
+        pickle.load(f)
+    except fickling.UnsafeFileError:
+        print("Unsafe file!")
+```
+
+#### Option 2: use a context manager
+
+```python
+with fickling.check_safety():
+    # All pickle files loaded within the context manager are checked for safety
+    try:
+        with open("file.pkl", "rb") as f:
+            pickle.load("file.pkl")
+    except fickling.UnsafeFileError:
+        print("Unsafe file!")
+
+# Files loaded outside of context manager are NOT checked
+pickle.load("file.pkl")
+```
+
+#### Option 3: check and load a single file
+
+```python
+# Use fickling.load() in place of pickle.load() to check safety and load a single pickle file 
+try:
+    fickling.load("file.pkl")
+except fickling.UnsafeFileError as e:
+    print("Unsafe file!")
+```
+
+#### Option 4: only check pickle file safety without loading
+
+```python3
+# Perform a safety check on a pickle file without loading it
+if not fickling.is_likely_safe("file.pkl"):
+    print("Unsafe file!")
+```
+
+#### Accessing the safety analysis results
+
+You can access the details of fickling's safety analysis from within the raised exception:
+
+```python
+
+>>> try:
+...     fickling.load("unsafe.pkl")
+... except fickling.UnsafeFileError as e:
+...     print(e.info)
+
+{
+    "severity": "OVERTLY_MALICIOUS",
+    "analysis": "Call to `eval(b'[5, 6, 7, 8]')` is almost certainly evidence of a malicious pickle file. Variable `_var0` is assigned value `eval(b'[5, 6, 7, 8]')` but unused afterward; this is suspicious and indicative of a malicious pickle file",
+    "detailed_results": {
+        "AnalysisResult": {
+            "OvertlyBadEval": "eval(b'[5, 6, 7, 8]')",
+            "UnusedVariables": [
+                "_var0",
+                "eval(b'[5, 6, 7, 8]')"
+            ]
+        }
+    }
+}
+```
+
+If you are using another language than Python, you can still use fickling's `CLI` to
+safety-check pickle files:
+
+```console
+fickling --check-safety -p pickled.data
+```
+
+## Advanced usage
+
+### Trace pickle execution
+
+Fickling's `CLI` allows to safely trace the execution of the Pickle virtual machine without
+exercising any malicious code:
+
+```console
+fickling --trace file.pkl
+```
+
+### Pickle code injection
+
+Fickling allows to inject arbitrary code in a pickle file that will run every time the file is loaded
+
+```console
+fickling --inject "print('Malicious')" file.pkl
+```
 
-## Usage
+### Pickle decompilation
 
-Fickling can be run programmatically:
+Fickling can be used to decompile a pickle file for further analysis
 
 ```python
->>> import ast
->>> import pickle
->>> from fickling.pickle import Pickled
->>> print(ast.dump(Pickled.load(pickle.dumps([1, 2, 3, 4])).ast, indent=4))
+>>> import ast, pickle
+>>> from fickling.fickle import Pickled
+>>> fickled_object = Pickled.load(pickle.dumps([1, 2, 3, 4]))
+>>> print(ast.dump(fickled_object.ast, indent=4))
 Module(
     body=[
         Assign(
@@ -46,35 +156,63 @@ Module(
                     Constant(value=2),
                     Constant(value=3),
                     Constant(value=4)],
-                ctx=Load()))])
+                ctx=Load()))],
+    type_ignores=[])
 ```
 
-Fickling can also be run as a command line utility:
+### PyTorch polyglots
 
-```console
-$ fickling pickled.data
-result = [1, 2, 3, 4]
-```
+We currently support inspecting, identifying, and creating file polyglots between the
+following PyTorch file formats:
 
-This is of course a simple example. However, Python pickle bytecode can run
-arbitrary Python commands (such as `exec` or `os.system`) so it is a security
-risk to unpickle untrusted data. You can test for common patterns of
-malicious pickle files with the `--check-safety` option:
+* **PyTorch v0.1.1**: Tar file with sys_info, pickle, storages, and tensors
+* **PyTorch v0.1.10**: Stacked pickle files
+* **TorchScript v1.0**: ZIP file with model.json and constants.pkl (a JSON file and a pickle file)
+* **TorchScript v1.1**: ZIP file with model.json and attribute.pkl (a JSON file and a pickle file)
+* **TorchScript v1.3**: ZIP file with data.pkl and constants.pkl (2 pickle files)
+* **TorchScript v1.4**: ZIP file with data.pkl, constants.pkl, and version (2 pickle files and a folder)
+* **PyTorch v1.3**: ZIP file containing data.pkl (1 pickle file)
+* **PyTorch model archive format**: ZIP file that includes Python code files and pickle files
 
-```console
-$ fickling --check-safety pickled.data
-Warning: Fickling failed to detect any overtly unsafe code, but the pickle file may still be unsafe.
-Do not unpickle this file if it is from an untrusted source!
+```python
+>> import torch
+>> import torchvision.models as models
+>> from fickling.pytorch import PyTorchModelWrapper
+>> model = models.mobilenet_v2()
+>> torch.save(model, "mobilenet.pth")
+>> fickled_model = PyTorchModelWrapper("mobilenet.pth")
+>> print(fickled_model.formats)
+Your file is most likely of this format:  PyTorch v1.3 
+['PyTorch v1.3']
 ```
 
-We do not recommend relying on the `--check-safety` option for critical use
-cases at this point in time.
+Check out [our examples](https://github.com/trailofbits/fickling/tree/master/example)
+to learn more about using fickling!
+
+## About pickle
+
+Pickled Python objects are in fact bytecode that is interpreted by a stack-based
+virtual machine built into Python called the "Pickle Machine". Fickling can take
+pickled data streams and decompile them into human-readable Python code that,
+when executed, will deserialize to the original serialized object. This is made
+possible by Fickling’s custom implementation of the PM. Fickling is safe to run
+on potentially malicious files because its PM symbolically executes code rather
+than overtly executing it.
+
+The authors do not prescribe any meaning to the “F” in Fickling; it could stand
+for “fickle,” … or something else. Divining its meaning is a personal journey
+in discretion and is left as an exercise to the reader.
+
+Learn more about fickling in our
+[blog post](https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/)
+and [DEF CON AI Village 2021 talk](https://www.youtube.com/watch?v=bZ0m_H_dEJI).
 
-You can also safely trace the execution of the Pickle virtual machine without
-exercising any malicious code with the `--trace` option.
+## Contact
 
-Finally, you can inject arbitrary Python code that will be run on unpickling
-into an existing pickle file with the `--inject` option.
+If you'd like to file a bug report or feature request, please use our
+[issues](https://github.com/trailofbits/fickling/issues) page.
+Feel free to contact us or reach out in
+[Empire Hacking](https://slack.empirehacking.nyc/) for help using or extending fickling.
 
 ## License
 

diff --git a/example/hook_functions.py b/example/hook_functions.py
@@ -3,10 +3,11 @@
 
 import numpy
 
-import fickling.hook as hook
+import fickling
 
 # Set up global fickling hook
-hook.run_hook()
+fickling.always_check_safety()
+# Eauivalent to fickling.hook.run_hook()
 
 # Fickling can check a pickle file for safety prior to running it
 test_list = [1, 2, 3]
@@ -41,5 +42,5 @@ def __reduce__(self):
 
 # This hook works when pickle.load is called under the hood in Python as well
 # Note that this does not always work for torch.load()
-# This should raise "SafetyError"
+# This should raise "UnsafeFileError"
 numpy.load("unsafe.pkl", allow_pickle=True)
diff --git a/example/pytorch_poc.py b/example/pytorch_poc.py
@@ -19,7 +19,7 @@
 # Define model
 class TheModelClass(nn.Module):
     def __init__(self):
-        super(TheModelClass, self).__init__()
+        super().__init__()
         self.conv1 = nn.Conv2d(3, 6, 5)
         self.pool = nn.MaxPool2d(2, 2)
         self.conv2 = nn.Conv2d(6, 13, 5)

diff --git a/fickling/__init__.py b/fickling/__init__.py
@@ -1,6 +1,8 @@
 # fmt: off
 from .loader import load #noqa
 from .context import check_safety #noqa
+from .hook import always_check_safety #noqa
+from .analysis import is_likely_safe # noqa
 # fmt: on
 
 # The above lines enables `fickling.load()` and `with fickling.check_safety()`

diff --git a/fickling/analysis.py b/fickling/analysis.py
@@ -322,3 +322,8 @@ def check_safety(
         with open(json_output_path, "a") as json_file:
             json.dump(severity_data, json_file, indent=4)
     return results
+
+
+def is_likely_safe(filepath: str):
+    with open(filepath, "rb") as f:
+        return check_safety(Pickled.load(f)).severity == Severity.LIKELY_SAFE
diff --git a/fickling/exception.py b/fickling/exception.py
@@ -0,0 +1,8 @@
+class UnsafeFileError(Exception):
+    def __init__(self, filepath, info):
+        super().__init__()
+        self.filepath = filepath
+        self.info = info
+
+    def __str__(self):
+        return f"Safety results for {self.filepath} : {str(self.info)}"
diff --git a/fickling/fickle.py b/fickling/fickle.py
@@ -3,7 +3,6 @@
 import re
 import struct
 import sys
-import warnings
 from abc import ABC, abstractmethod
 from collections.abc import MutableSequence, Sequence
 from enum import Enum
@@ -701,21 +700,6 @@ def has_non_setstate_call(self) -> bool:
         object.__setstate__"""
         return bool(self.properties.non_setstate_calls)
 
-    def check_safety(self):
-        from fickling.analysis import check_safety  # noqa
-
-        safety_results = check_safety(self)
-        return safety_results
-
-    def is_likely_safe(self):
-        warnings.warn(
-            "The attribute .is_likely_safe will be deprecated."
-            "Use the attribute .check_safety instead.",
-            DeprecationWarning,
-            stacklevel=2,
-        )
-        return self.check_safety(self)
-
     def unsafe_imports(self) -> Iterator[Union[ast.Import, ast.ImportFrom]]:
         for node in self.properties.imports:
             if node.module in (

diff --git a/fickling/hook.py b/fickling/hook.py
@@ -4,5 +4,12 @@
 
 
 def run_hook():
-    # This is the global function hook
+    """Replace pickle.load() by fickling's load()"""
     pickle.load = loader.load
+
+
+def always_check_safety():
+    """
+    Alias for run_hook()
+    """
+    run_hook()
diff --git a/fickling/loader.py b/fickling/loader.py
@@ -1,15 +1,10 @@
 import pickle
 
 from fickling.analysis import Severity, check_safety
+from fickling.exception import UnsafeFileError
 from fickling.fickle import Pickled
 
 
-class SafetyError(Exception):
-    """Exception raised when a file is deemed unsafe by fickling."""
-
-    pass
-
-
 def load(
     file,
     max_acceptable_severity=Severity.LIKELY_SAFE,
@@ -27,4 +22,4 @@ def load(
         # loaded after the analysis.
         return pickle.loads(pickled_data.dumps(), *args, **kwargs)
     else:
-        raise SafetyError(f"File is unsafe: {result.severity.name}")
+        raise UnsafeFileError(file, result.to_dict())
diff --git a/fickling_image.png b/fickling_image.png