Deploying to gh-pages from @ 3caad41 🚀

TransformerLensOrg · Oct 20, 2023 · fec10be · fec10be
1 parent 3d7d739
commit fec10be
Show file tree

Hide file tree

Showing 47 changed files with 2,125 additions and 2,126 deletions.
diff --git a/.doctrees/content/citation.doctree b/.doctrees/content/citation.doctree
diff --git a/.doctrees/content/development.doctree b/.doctrees/content/development.doctree
diff --git a/.doctrees/content/gallery.doctree b/.doctrees/content/gallery.doctree
diff --git a/.doctrees/content/getting_started_mech_interp.doctree b/.doctrees/content/getting_started_mech_interp.doctree
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/.doctrees/generated/demos/Exploratory_Analysis_Demo.doctree b/.doctrees/generated/demos/Exploratory_Analysis_Demo.doctree
diff --git a/.doctrees/index.doctree b/.doctrees/index.doctree
diff --git a/.doctrees/nbsphinx/generated/demos/Exploratory_Analysis_Demo.ipynb b/.doctrees/nbsphinx/generated/demos/Exploratory_Analysis_Demo.ipynb
diff --git a/_sources/content/citation.md.txt b/_sources/content/citation.md.txt
@@ -6,10 +6,8 @@ Please cite this library as:
 ```BibTeX
 @misc{nanda2022transformerlens,
     title = {TransformerLens},
-    author = {Neel Nanda},
+    author = {Neel Nanda and Joseph Bloom},
     year = {2022},
     howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}},
 }
 ```
-
-Also, if you're actually using this for your research, I'd love to chat! Reach out at [email protected]
diff --git a/_sources/content/development.md.txt b/_sources/content/development.md.txt
diff --git a/_sources/content/gallery.md.txt b/_sources/content/gallery.md.txt
@@ -1,6 +1,39 @@
 # Gallery
 
+Research done involving TransformerLens:
+
+- [Progress Measures for Grokking via Mechanistic
+  Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
+  Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
+- [Finding Neurons in a Haystack: Case Studies with Sparse
+  Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
+  Harvey, Dmitrii Troitskii, Dimitris Bertsimas
+- [Towards Automated Circuit Discovery for Mechanistic
+  Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker,
+  Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
+- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello)
+  by Neel Nanda
+- [A circuit for Python docstrings in a 4-layer attention-only
+  transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only)
+  by Stefan Heimersheim and Jett Janiak
+- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai,
+  Lawrence Chan, Neel Nanda
+- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language
+  Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel
+  Nanda, Esben Kran, Ioannis Konstas, Fazl Barez
+- [Eliciting Latent Predictions from Transformers with the Tuned
+  Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi,
+  Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt
+
 User contributed examples of the library being used in action:
 
-* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane
-* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
+- [Induction Heads Phase Change
+  Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb):
+  A partial replication of [In-Context Learning and Induction
+  Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
+  from Connor Kissane
+- [Decision Transformer
+  Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of
+  scripts for training decision transformers which uses transformer lens to view intermediate
+  activations, perform attribution and ablations. A write up of the initial work can be found
+  [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
diff --git a/_sources/content/getting_started_mech_interp.md.txt b/_sources/content/getting_started_mech_interp.md.txt
@@ -0,0 +1,37 @@
+# Getting Started in Mechanistic Interpretability
+
+Mechanistic interpretability is a very young and small field, and there are a _lot_ of open
+problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if
+you would like to help, please try working on one! The standard answer to "why has no one done this
+yet" is just that there aren't enough people! Key resources:
+
+- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started)
+- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from
+  Callum McDougall. A comprehensive practical introduction to mech interp, written in
+  TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable
+  tutorials:
+  - [Coding GPT-2 from
+    scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with
+    accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial)
+    [2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers
+  - [Introduction to Mech Interp and
+    TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
+    introduction to TransformerLens and mech interp via studying induction heads. Covers the
+    foundational concepts of the library
+  - [Indirect Object
+    Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification):
+    a replication of interpretability in the wild, that covers standard techniques in mech interp
+    such as [direct logit
+    attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ),
+    [activation patching and path
+    patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching)
+- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list)
+- [200 Concrete Open Problems in Mechanistic
+  Interpretability](https://neelnanda.io/concrete-open-problems)
+- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look
+  up all the jargon and unfamiliar terms you're going to come across!
+- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range
+  of mech interp video content, including [paper
+  walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1),
+  and [walkthroughs of doing
+  research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T)
diff --git a/_sources/index.md.txt b/_sources/index.md.txt
@@ -23,37 +23,32 @@ The core features were heavily inspired by the interface to [Anthropic's excelle
 :caption: Introduction
 
 content/getting_started
+content/getting_started_mech_interp
 content/gallery
 ```
 
 ```{toctree}
 :hidden:
-:caption: Resources
-
-content/tutorials
-content/citation
-content/contributing
-generated/demos/Exploratory_Analysis_Demo
-```
-
-```{toctree}
-:hidden:
-:caption: Code
+:caption: Documentation
 
 generated/code/modules
+generated/model_properties_table.md
 ```
 
 ```{toctree}
 :hidden:
-:caption: Models
+:caption: Resources
 
-generated/model_properties_table.md
+content/tutorials
+content/citation
+content/contributing
+generated/demos/Exploratory_Analysis_Demo
 ```
 
 ```{toctree}
 :hidden:
 :caption: Development
 
-content/development
+content/contributing
 Github <https://github.com/neelnanda-io/TransformerLens>
 ```
diff --git a/content/citation.html b/content/citation.html
@@ -160,16 +160,10 @@
   <p class="caption" role="heading"><span class="caption-text">Introduction</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting Started</a></li>
+<li class="toctree-l1"><a class="reference internal" href="getting_started_mech_interp.html">Getting Started in Mechanistic Interpretability</a></li>
 <li class="toctree-l1"><a class="reference internal" href="gallery.html">Gallery</a></li>
 </ul>
-<p class="caption" role="heading"><span class="caption-text">Resources</span></p>
-<ul class="current">
-<li class="toctree-l1"><a class="reference internal" href="tutorials.html">Tutorials</a></li>
-<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Citation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
-<li class="toctree-l1"><a class="reference internal" href="../generated/demos/Exploratory_Analysis_Demo.html">Exploratory Analysis Demo</a></li>
-</ul>
-<p class="caption" role="heading"><span class="caption-text">Code</span></p>
+<p class="caption" role="heading"><span class="caption-text">Documentation</span></p>
 <ul>
 <li class="toctree-l1 has-children"><a class="reference internal" href="../generated/code/modules.html">Transformer Lens API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
 <li class="toctree-l2 has-children"><a class="reference internal" href="../generated/code/transformer_lens.html">transformer_lens</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
@@ -196,14 +190,18 @@
 </li>
 </ul>
 </li>
-</ul>
-<p class="caption" role="heading"><span class="caption-text">Models</span></p>
-<ul>
 <li class="toctree-l1"><a class="reference internal" href="../generated/model_properties_table.html">Model Properties Table</a></li>
 </ul>
+<p class="caption" role="heading"><span class="caption-text">Resources</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="tutorials.html">Tutorials</a></li>
+<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Citation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../generated/demos/Exploratory_Analysis_Demo.html">Exploratory Analysis Demo</a></li>
+</ul>
 <p class="caption" role="heading"><span class="caption-text">Development</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="development.html">Local Development</a></li>
+<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference external" href="https://github.com/neelnanda-io/TransformerLens">Github</a></li>
 </ul>
 
@@ -244,13 +242,12 @@ <h1>Citation<a class="headerlink" href="#citation" title="Permalink to this head
 <p>Please cite this library as:</p>
 <div class="highlight-BibTeX notranslate"><div class="highlight"><pre><span></span><span class="nc">@misc</span><span class="p">{</span><span class="nl">nanda2022transformerlens</span><span class="p">,</span>
 <span class="w">    </span><span class="na">title</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">{TransformerLens}</span><span class="p">,</span>
-<span class="w">    </span><span class="na">author</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">{Neel Nanda}</span><span class="p">,</span>
+<span class="w">    </span><span class="na">author</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">{Neel Nanda and Joseph Bloom}</span><span class="p">,</span>
 <span class="w">    </span><span class="na">year</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">{2022}</span><span class="p">,</span>
 <span class="w">    </span><span class="na">howpublished</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">{\url{https://github.com/neelnanda-io/TransformerLens}}</span><span class="p">,</span>
 <span class="p">}</span>
 </pre></div>
 </div>
-<p>Also, if you’re actually using this for your research, I’d love to chat! Reach out at neelnanda27&#64;gmail.com</p>
 </section>
 
         </article>

diff --git a/content/contributing.html b/content/contributing.html
@@ -160,16 +160,10 @@
   <p class="caption" role="heading"><span class="caption-text">Introduction</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting Started</a></li>
+<li class="toctree-l1"><a class="reference internal" href="getting_started_mech_interp.html">Getting Started in Mechanistic Interpretability</a></li>
 <li class="toctree-l1"><a class="reference internal" href="gallery.html">Gallery</a></li>
 </ul>
-<p class="caption" role="heading"><span class="caption-text">Resources</span></p>
-<ul class="current">
-<li class="toctree-l1"><a class="reference internal" href="tutorials.html">Tutorials</a></li>
-<li class="toctree-l1"><a class="reference internal" href="citation.html">Citation</a></li>
-<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Contributing</a></li>
-<li class="toctree-l1"><a class="reference internal" href="../generated/demos/Exploratory_Analysis_Demo.html">Exploratory Analysis Demo</a></li>
-</ul>
-<p class="caption" role="heading"><span class="caption-text">Code</span></p>
+<p class="caption" role="heading"><span class="caption-text">Documentation</span></p>
 <ul>
 <li class="toctree-l1 has-children"><a class="reference internal" href="../generated/code/modules.html">Transformer Lens API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
 <li class="toctree-l2 has-children"><a class="reference internal" href="../generated/code/transformer_lens.html">transformer_lens</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
@@ -196,14 +190,18 @@
 </li>
 </ul>
 </li>
-</ul>
-<p class="caption" role="heading"><span class="caption-text">Models</span></p>
-<ul>
 <li class="toctree-l1"><a class="reference internal" href="../generated/model_properties_table.html">Model Properties Table</a></li>
 </ul>
+<p class="caption" role="heading"><span class="caption-text">Resources</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="tutorials.html">Tutorials</a></li>
+<li class="toctree-l1"><a class="reference internal" href="citation.html">Citation</a></li>
+<li class="toctree-l1 current"><a class="current reference internal" href="#">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../generated/demos/Exploratory_Analysis_Demo.html">Exploratory Analysis Demo</a></li>
+</ul>
 <p class="caption" role="heading"><span class="caption-text">Development</span></p>
-<ul>
-<li class="toctree-l1"><a class="reference internal" href="development.html">Local Development</a></li>
+<ul class="current">
+<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Contributing</a></li>
 <li class="toctree-l1"><a class="reference external" href="https://github.com/neelnanda-io/TransformerLens">Github</a></li>
 </ul>