Sync docs with readme (#428)

Sync them to contain the same information, and add a note to remind us to keep them in sync in the future. Note this is a simple approach - we'll look at something more complex once the docs are in a better place.
TransformerLensOrg · Oct 20, 2023 · 3caad41 · 3caad41
1 parent ea3989d
commit 3caad41
Show file tree

Hide file tree

Showing 5 changed files with 82 additions and 47 deletions.
diff --git a/docs/source/content/citation.md b/docs/source/content/citation.md
@@ -6,10 +6,8 @@ Please cite this library as:
 ```BibTeX
 @misc{nanda2022transformerlens,
     title = {TransformerLens},
-    author = {Neel Nanda},
+    author = {Neel Nanda and Joseph Bloom},
     year = {2022},
     howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}},
 }
 ```
-
-Also, if you're actually using this for your research, I'd love to chat! Reach out at [email protected]
diff --git a/docs/source/content/development.md b/docs/source/content/development.md
diff --git a/docs/source/content/gallery.md b/docs/source/content/gallery.md
@@ -1,6 +1,39 @@
 # Gallery
 
+Research done involving TransformerLens:
+
+- [Progress Measures for Grokking via Mechanistic
+  Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
+  Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
+- [Finding Neurons in a Haystack: Case Studies with Sparse
+  Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
+  Harvey, Dmitrii Troitskii, Dimitris Bertsimas
+- [Towards Automated Circuit Discovery for Mechanistic
+  Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker,
+  Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
+- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello)
+  by Neel Nanda
+- [A circuit for Python docstrings in a 4-layer attention-only
+  transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only)
+  by Stefan Heimersheim and Jett Janiak
+- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai,
+  Lawrence Chan, Neel Nanda
+- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language
+  Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel
+  Nanda, Esben Kran, Ioannis Konstas, Fazl Barez
+- [Eliciting Latent Predictions from Transformers with the Tuned
+  Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi,
+  Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt
+
 User contributed examples of the library being used in action:
 
-* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane
-* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
+- [Induction Heads Phase Change
+  Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb):
+  A partial replication of [In-Context Learning and Induction
+  Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
+  from Connor Kissane
+- [Decision Transformer
+  Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of
+  scripts for training decision transformers which uses transformer lens to view intermediate
+  activations, perform attribution and ablations. A write up of the initial work can be found
+  [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
diff --git a/docs/source/content/getting_started_mech_interp.md b/docs/source/content/getting_started_mech_interp.md
@@ -0,0 +1,37 @@
+# Getting Started in Mechanistic Interpretability
+
+Mechanistic interpretability is a very young and small field, and there are a _lot_ of open
+problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if
+you would like to help, please try working on one! The standard answer to "why has no one done this
+yet" is just that there aren't enough people! Key resources:
+
+- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started)
+- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from
+  Callum McDougall. A comprehensive practical introduction to mech interp, written in
+  TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable
+  tutorials:
+  - [Coding GPT-2 from
+    scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with
+    accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial)
+    [2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers
+  - [Introduction to Mech Interp and
+    TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
+    introduction to TransformerLens and mech interp via studying induction heads. Covers the
+    foundational concepts of the library
+  - [Indirect Object
+    Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification):
+    a replication of interpretability in the wild, that covers standard techniques in mech interp
+    such as [direct logit
+    attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ),
+    [activation patching and path
+    patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching)
+- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list)
+- [200 Concrete Open Problems in Mechanistic
+  Interpretability](https://neelnanda.io/concrete-open-problems)
+- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look
+  up all the jargon and unfamiliar terms you're going to come across!
+- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range
+  of mech interp video content, including [paper
+  walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1),
+  and [walkthroughs of doing
+  research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T)
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -23,37 +23,32 @@ The core features were heavily inspired by the interface to [Anthropic's excelle
 :caption: Introduction
 
 content/getting_started
+content/getting_started_mech_interp
 content/gallery
 ```
 
 ```{toctree}
 :hidden:
-:caption: Resources
-
-content/tutorials
-content/citation
-content/contributing
-generated/demos/Exploratory_Analysis_Demo
-```
-
-```{toctree}
-:hidden:
-:caption: Code
+:caption: Documentation
 
 generated/code/modules
+generated/model_properties_table.md
 ```
 
 ```{toctree}
 :hidden:
-:caption: Models
+:caption: Resources
 
-generated/model_properties_table.md
+content/tutorials
+content/citation
+content/contributing
+generated/demos/Exploratory_Analysis_Demo
 ```
 
 ```{toctree}
 :hidden:
 :caption: Development
 
-content/development
+content/contributing
 Github <https://github.com/neelnanda-io/TransformerLens>
 ```