Release 2.0

TransformerLensOrg · May 30, 2024 · 9321ca0 · 9321ca0
2 parents f4287b6 + 43b67a4
commit 9321ca0
Show file tree

Hide file tree

Showing 70 changed files with 3,250 additions and 22,029 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,6 +2,8 @@
 When opening your PR, please make sure to only request a merge to `main` when you have found a bug in the currently released version of TransformerLens. All other PRs should go to `dev` in order to keep the docs in sync with the currently released version.
 
 Please also make sure the branch you are attempting to merge from is not named `main`, or `dev`. Branches with these names from a different remote cause conflicting name issues when we periodically attempt to bring your PR up to date with the current stable TransformerLens source.
+
+If your PR is primarily affecting docs, make sure has the string "docs" in its name. Building docs is disabled by default to avoid CI time, but the job has been configured to run whenever a branch with the word "docs" in it is being merged.
 -->
 # Description
 

diff --git a/.github/workflows/checks.yml b/.github/workflows/checks.yml
@@ -4,6 +4,7 @@ on:
   push:
     branches:
       - main
+      - dev*
     paths:
       - "**" # Include all files by default
       - "!.devcontainer/**"
@@ -15,6 +16,7 @@ on:
   pull_request:
     branches:
       - main
+      - dev*
     paths:
       - "**"
       - "!.devcontainer/**"
@@ -125,8 +127,7 @@ jobs:
           - "Exploratory_Analysis_Demo"
           # - "Grokking_Demo"
           # - "Head_Detector_Demo"
-          # - "Hooked_SAE_Transformer_Demo"
-          # - "Interactive_Neuroscope"
+          - "Interactive_Neuroscope"
           # - "LLaMA"
           # - "LLaMA2_GPU_Quantized"
           - "Main_Demo"
@@ -165,7 +166,7 @@ jobs:
     # When running on merge to main, it builds the docs and then another job deploys them
     name: 'Build Docs'
     runs-on: ubuntu-latest
-    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev')
+    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev') || contains(github.head_ref, 'docs')
     needs: code-checks
     steps:
       - uses: actions/checkout@v4

diff --git a/README.md b/README.md
@@ -15,6 +15,15 @@ A Library for Mechanistic Interpretability of Generative Language Models.
 [![Read the Docs
 Here](https://img.shields.io/badge/-Read%20the%20Docs%20Here-blue?style=for-the-badge&logo=Read-the-Docs&logoColor=white&link=https://TransformerLensOrg.github.io/TransformerLens/)](https://TransformerLensOrg.github.io/TransformerLens/)
 
+| :exclamation:  HookedSAETransformer Removed   |
+|-----------------------------------------------|
+
+Hooked SAE has been removed from TransformerLens 2.0. The functionality is being moved to
+[SAELens](http://github.com/jbloomAus/SAELens). For more information on this release, please see the
+accompanying
+[announcement](https://transformerlensorg.github.io/TransformerLens/content/news/release-2.0.html)
+for details on what's new, and the future of TransformerLens.
+
 This is a library for doing [mechanistic
 interpretability](https://distill.pub/2020/circuits/zoom-in/) of GPT-2 Style language models. The
 goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms
@@ -24,8 +33,6 @@ TransformerLens lets you load in 50+ different open source language models, and
 activations of the model to you. You can cache any internal activation in the model, and add in
 functions to edit, remove or replace these activations as the model runs.
 
-The library also now supports mechanistic interpretability with SAEs (sparse autoencoders)! With [HookedSAETransformer](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Hooked_SAE_Transformer_Demo.ipynb), you can splice in SAEs during inference and cache + intervene on SAE activations. We recommend [SAELens](https://github.com/jbloomAus/SAELens) (built on top of TransformerLens) for training SAEs.
-
 ## Quick Start
 
 ### Install
@@ -51,7 +58,6 @@ logits, activations = model.run_with_cache("Hello World")
 * [Introduction to the Library and Mech
   Interp](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp)
 * [Demo of Main TransformerLens Features](https://neelnanda.io/transformer-lens-demo)
-* [Demo of HookedSAETransformer Features](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Hooked_SAE_Transformer_Demo.ipynb)
 
 ## Gallery
 

diff --git a/demos/Config_Overhaul.ipynb b/demos/Config_Overhaul.ipynb
@@ -0,0 +1,251 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Overview\n",
+    "\n",
+    "The current way configuration is designed in TransformerLens has a lot of limitations. It does not\n",
+    "allow for outside people to pass through configurations that are not officially supported, and it\n",
+    "is very bug prone with something as simple as typo potentially giving you a massive headache. There\n",
+    "are also a number of hidden rules that are not clearly documented, which can go hidden until\n",
+    "different pieces of TransformerLens are activated. Allowing to pass in an optional object of configuration\n",
+    "with no further changes does solve a couple of these problems, but it does not solve the bigger\n",
+    "issues. It also introduces new problems with users potentially passing in architectures that are not\n",
+    "supported without having a clear way to inform the user what isn't supported.\n",
+    "\n",
+    "My proposal for how all of these problems can be resolved is to fundamentally revamp the\n",
+    "configuration to allow for something that I like to call configuration composition. From a technical\n",
+    "perspective, this involves creating a centralized class that describes all supported configurations\n",
+    "by TransformerLens. This class would then be used to construct specific configurations for all models\n",
+    "that are currently supported, and it would then allow anyone to easily see in a single place all\n",
+    "configuration features supported by TransformerLens while also being able to read the code to\n",
+    "understand how they can create their own configurations for the purpose of either submitting new\n",
+    "models into TransformerLens, or configuring an unofficially supported model by TransformerLens,\n",
+    "when TransformerLens already happens to support all of the architectural pieces separately.\n",
+    "\n",
+    "This could simple be an overhaul of the existing HookedTransformerConfig. Everything I am\n",
+    "describing here could be made compatible with that class to give it a more usable interface that is\n",
+    "then directly interacted with by the end user. At the moment, that class is not really built to be\n",
+    "interacted with, and is instead used as a wrapper around spreading configured anonymous objects.\n",
+    "Overhauling this class to do what I am about to describe is a viable path, but keeping it as it is,\n",
+    "and making a new class as something meant to be used by the end user would be a way to maintain\n",
+    "compatibility, avoid refactors, and keep model configuration only focused on putting together\n",
+    "configuration for models, as opposed to configuring full settings needed by HookedTransformer, which\n",
+    "includes checking the available environment.\n",
+    "\n",
+    "A very unscientific basic example of how this would look in code by the end user can be seen\n",
+    "immediately below. I will delve into details of each piece in this document."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "config = ModelConfig(\n",
+    "    d_model=4096,\n",
+    "    d_head=8192 // 64,\n",
+    "    n_heads=64,\n",
+    "    act_fn=\"silu\"\n",
+    "    # Other universally required properties across all models go here in the constructor\n",
+    ")\n",
+    "# Enabling specific features not universal among all models\n",
+    "config.enabled_gated_mlp()\n",
+    "# Customizing optional attributes\n",
+    "config.set_positional_embedding_type(\"alibi\")\n",
+    "\n",
+    "# and so on, until the full configuration is set\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The constructor\n",
+    "\n",
+    "The first piece of this I want to talk about is what will be injected into the constructor. It\n",
+    "should basically take everything absolutely required by all models. This keeps the code easy for\n",
+    "someone to understand, without adding too much clutter. All fields should be required, and if there\n",
+    "is ever an idea that a field should be in the constructor as an option, then that is probably an\n",
+    "indication that there is a good case to add a function to configure that variable in a different\n",
+    "point in the class. An example of what this would look like can be seen below..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# make it easy for someone to see what activation functions are supported, this would be moved from\n",
+    "# HookedTransformerConfig\n",
+    "ActivationFunction = \"silu\" | \"gelu\"\n",
+    "\n",
+    "class ModelConfig:\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        d_model: int,\n",
+    "        eps: int,\n",
+    "        act_fn: ActivationFunction,\n",
+    "        remaining_required_attributes,\n",
+    "    ):\n",
+    "        self.d_model = d_model\n",
+    "        self.eps = eps\n",
+    "        self.act_fn = act_fn\n",
+    "        # Set defaults for any remaining supported attributes that are not required here \n",
+    "        self.gated_mlp = False\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Boolean Variables\n",
+    "\n",
+    "Within TransformerLens config, anything that is a boolean variable is essentially a feature flag.\n",
+    "This means that all features at the time of construction would have default values, most likely set\n",
+    "to false. They then get toggled on with an `enable_feature` function call on the config object.\n",
+    "Having these functions will make very clear for someone less familiar with TransformerLens what\n",
+    "features are available. It also allows us to decorate these calls, which is very important. There\n",
+    "are some instances where if a boolean is true, a different one cannot be true, but this requirement\n",
+    "is not clear anywhere without analyzing code. Decorating these functions allows us to make sure\n",
+    "these sort of bugs are not possible. I will use `gated_mlp` as an example here, but it is not\n",
+    "meant to be a real implementation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def enabled_gated_mlp(self: ModelConfig) -> ModelConfig:\n",
+    "    self.gated_mlp = True\n",
+    "    # Configure any side effects caused by enabling of a feature\n",
+    "    self.another_feature = False\n",
+    "    # Returning self allows someone to chain together config calls\n",
+    "    return self\n",
+    "\n",
+    "ModelConfig.enabled_gated_mlp = enabled_gated_mlp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Additional Options\n",
+    "\n",
+    "Any other options would similarly have their own functions to configure. This allows for similar\n",
+    "decoration as with feature flags, and it also in a way documents the architectural capabilities of\n",
+    "TransformerLens in a single place. If there are groups of options that are also always required\n",
+    "together, this then gives us a way to require all of those options as opposed to having them all be\n",
+    "configured at the root level. This also allows us to make changes to other attributes that may be\n",
+    "affected as a side affect of having some values set, which again makes it both harder for people to\n",
+    "introduce bugs, and also creates code that documents itself. Another off the cuff example of\n",
+    "something like this can be seen below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def set_rotary_dim(self: ModelConfig, rotary_dim: int) -> ModelConfig:\n",
+    "    self.rotary_dim = rotary_dim\n",
+    "    # Additional settings that seem to be present whenever rotary_dim is set\n",
+    "    self.positional_embedding_type = \"rotary\"\n",
+    "    self.rotary_adjacent_pairs = False\n",
+    "    return self\n",
+    "\n",
+    "ModelConfig.set_rotary_dim = set_rotary_dim"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Config Final Thoughts\n",
+    "\n",
+    "The best way to describe this idea is configuration composition. The reason being is that the user is\n",
+    "essentially composing a model configuration by setting the base, and then combining various options\n",
+    "from predefined functions. Doing it like this has a lot of advantages. One of those advantages being\n",
+    "that there would need to be a lot less memorization on how architectures should be combined. e.g.\n",
+    "maybe it's not that hard to remember that `rotary_adjacent_pairs` should be False when `rotary_dim`\n",
+    "is set, but these sorts of combinations accumulate. Having it interfaced out gives everyone a\n",
+    "place to look to see how parts of configuration work in isolation without the need to memorize a\n",
+    "large amount of rules.\n",
+    "\n",
+    "This would also allow us to more easily mock out fake configurations and enable specific features in\n",
+    "order to test that functionality in isolation. This also should make it easier for someone to at a\n",
+    "glance understand all model compatibilities with TransformerLens, since there would be a single file\n",
+    "where they would all be listed out and documented. It will also allow for people to see\n",
+    "compatibility limitations at a glance.\n",
+    "\n",
+    "As for compatibility, this change would be 100% compatible with the existing structure. The objects\n",
+    "I am suggesting are abstractions of the existing configuration dictionaries for the purpose of\n",
+    "communication and ease of use. This means that they can be passed around just like the current\n",
+    "anonymous dictionaries.\n",
+    "\n",
+    "## Further Changes\n",
+    "\n",
+    "With this, there are a number of changes that I would like to make to the actual\n",
+    "`loading_from_pretrained` file in order to revise it to be ready for the possibility of rapidly\n",
+    "supporting new models. The biggest change in this respect would be to break out what is now a\n",
+    "configuration dictionary for every model into having its own module where one of these configuration\n",
+    "objects would be constructed. That object would then be exposed, so that it can be imported into\n",
+    "`loading_from_pretrained`. We would then create a dictionary where the official name of the\n",
+    "model would have the configuration object as its value, thus completely eliminating that big giant\n",
+    "if else statement, and replacing it with a simple return from the dictionary. The configurations\n",
+    "themselves would then live in a directory structure like so...\n",
+    "\n",
+    "config/ <- where the ModelConfig file lives\n",
+    "config/meta-llama/ <- directory for all models from the group\n",
+    "config/meta-llama/Llama-2-13b.py <- name matching hugging face to make it really easy to find the\n",
+    "                                    configuration\n",
+    "\n",
+    "## Impact on Testing\n",
+    "\n",
+    "This change, would allow us to directly interact with these configuration objects to allow us to\n",
+    "more easily assert that configurations are set properly, and to also allow us to more easily access\n",
+    "these configurations in tests for the purposes of writing better unit tests. \n",
+    "\n",
+    "## Summary\n",
+    "\n",
+    "This change should solve a lot of problems. It may be a big change at first from what currently\n",
+    "exists, but in time I think most people will find it more elegant, and easier to understand. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}