Skip to content

Commit

Permalink
Sync docs with readme (#428)
Browse files Browse the repository at this point in the history
Sync them to contain the same information, and add a note to remind us to keep them in sync in the future.

Note this is a simple approach - we'll look at something more complex once the docs are in a better place.
  • Loading branch information
alan-cooney authored Oct 20, 2023
1 parent ea3989d commit 3caad41
Show file tree
Hide file tree
Showing 5 changed files with 82 additions and 47 deletions.
4 changes: 1 addition & 3 deletions docs/source/content/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@ Please cite this library as:
```BibTeX
@misc{nanda2022transformerlens,
title = {TransformerLens},
author = {Neel Nanda},
author = {Neel Nanda and Joseph Bloom},
year = {2022},
howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}},
}
```

Also, if you're actually using this for your research, I'd love to chat! Reach out at [email protected]
28 changes: 0 additions & 28 deletions docs/source/content/development.md

This file was deleted.

37 changes: 35 additions & 2 deletions docs/source/content/gallery.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,39 @@
# Gallery

Research done involving TransformerLens:

- [Progress Measures for Grokking via Mechanistic
Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
- [Finding Neurons in a Haystack: Case Studies with Sparse
Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
Harvey, Dmitrii Troitskii, Dimitris Bertsimas
- [Towards Automated Circuit Discovery for Mechanistic
Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker,
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello)
by Neel Nanda
- [A circuit for Python docstrings in a 4-layer attention-only
transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only)
by Stefan Heimersheim and Jett Janiak
- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai,
Lawrence Chan, Neel Nanda
- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language
Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel
Nanda, Esben Kran, Ioannis Konstas, Fazl Barez
- [Eliciting Latent Predictions from Transformers with the Tuned
Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi,
Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

User contributed examples of the library being used in action:

* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane
* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
- [Induction Heads Phase Change
Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb):
A partial replication of [In-Context Learning and Induction
Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
from Connor Kissane
- [Decision Transformer
Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of
scripts for training decision transformers which uses transformer lens to view intermediate
activations, perform attribution and ablations. A write up of the initial work can be found
[here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
37 changes: 37 additions & 0 deletions docs/source/content/getting_started_mech_interp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Getting Started in Mechanistic Interpretability

Mechanistic interpretability is a very young and small field, and there are a _lot_ of open
problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if
you would like to help, please try working on one! The standard answer to "why has no one done this
yet" is just that there aren't enough people! Key resources:

- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started)
- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from
Callum McDougall. A comprehensive practical introduction to mech interp, written in
TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable
tutorials:
- [Coding GPT-2 from
scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with
accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial)
[2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers
- [Introduction to Mech Interp and
TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
introduction to TransformerLens and mech interp via studying induction heads. Covers the
foundational concepts of the library
- [Indirect Object
Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification):
a replication of interpretability in the wild, that covers standard techniques in mech interp
such as [direct logit
attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ),
[activation patching and path
patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching)
- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list)
- [200 Concrete Open Problems in Mechanistic
Interpretability](https://neelnanda.io/concrete-open-problems)
- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look
up all the jargon and unfamiliar terms you're going to come across!
- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range
of mech interp video content, including [paper
walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1),
and [walkthroughs of doing
research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T)
23 changes: 9 additions & 14 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,37 +23,32 @@ The core features were heavily inspired by the interface to [Anthropic's excelle
:caption: Introduction
content/getting_started
content/getting_started_mech_interp
content/gallery
```

```{toctree}
:hidden:
:caption: Resources
content/tutorials
content/citation
content/contributing
generated/demos/Exploratory_Analysis_Demo
```

```{toctree}
:hidden:
:caption: Code
:caption: Documentation
generated/code/modules
generated/model_properties_table.md
```

```{toctree}
:hidden:
:caption: Models
:caption: Resources
generated/model_properties_table.md
content/tutorials
content/citation
content/contributing
generated/demos/Exploratory_Analysis_Demo
```

```{toctree}
:hidden:
:caption: Development
content/development
content/contributing
Github <https://github.com/neelnanda-io/TransformerLens>
```

0 comments on commit 3caad41

Please sign in to comment.