Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync readme with docs #428

Merged
merged 2 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions docs/source/content/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@ Please cite this library as:
```BibTeX
@misc{nanda2022transformerlens,
title = {TransformerLens},
author = {Neel Nanda},
author = {Neel Nanda and Joseph Bloom},
year = {2022},
howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}},
}
```

Also, if you're actually using this for your research, I'd love to chat! Reach out at [email protected]
28 changes: 0 additions & 28 deletions docs/source/content/development.md

This file was deleted.

37 changes: 35 additions & 2 deletions docs/source/content/gallery.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,39 @@
# Gallery

Research done involving TransformerLens:

- [Progress Measures for Grokking via Mechanistic
Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
- [Finding Neurons in a Haystack: Case Studies with Sparse
Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
Harvey, Dmitrii Troitskii, Dimitris Bertsimas
- [Towards Automated Circuit Discovery for Mechanistic
Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker,
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello)
by Neel Nanda
- [A circuit for Python docstrings in a 4-layer attention-only
transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only)
by Stefan Heimersheim and Jett Janiak
- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai,
Lawrence Chan, Neel Nanda
- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language
Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel
Nanda, Esben Kran, Ioannis Konstas, Fazl Barez
- [Eliciting Latent Predictions from Transformers with the Tuned
Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi,
Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

User contributed examples of the library being used in action:

* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane
* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
- [Induction Heads Phase Change
Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb):
A partial replication of [In-Context Learning and Induction
Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
from Connor Kissane
- [Decision Transformer
Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of
scripts for training decision transformers which uses transformer lens to view intermediate
activations, perform attribution and ablations. A write up of the initial work can be found
[here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).
37 changes: 37 additions & 0 deletions docs/source/content/getting_started_mech_interp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Getting Started in Mechanistic Interpretability

Mechanistic interpretability is a very young and small field, and there are a _lot_ of open
problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if
you would like to help, please try working on one! The standard answer to "why has no one done this
yet" is just that there aren't enough people! Key resources:

- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started)
- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from
Callum McDougall. A comprehensive practical introduction to mech interp, written in
TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable
tutorials:
- [Coding GPT-2 from
scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with
accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial)
[2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers
- [Introduction to Mech Interp and
TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
introduction to TransformerLens and mech interp via studying induction heads. Covers the
foundational concepts of the library
- [Indirect Object
Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification):
a replication of interpretability in the wild, that covers standard techniques in mech interp
such as [direct logit
attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ),
[activation patching and path
patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching)
- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list)
- [200 Concrete Open Problems in Mechanistic
Interpretability](https://neelnanda.io/concrete-open-problems)
- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look
up all the jargon and unfamiliar terms you're going to come across!
- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range
of mech interp video content, including [paper
walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1),
and [walkthroughs of doing
research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T)
23 changes: 9 additions & 14 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,37 +23,32 @@ The core features were heavily inspired by the interface to [Anthropic's excelle
:caption: Introduction

content/getting_started
content/getting_started_mech_interp
content/gallery
```

```{toctree}
:hidden:
:caption: Resources

content/tutorials
content/citation
content/contributing
generated/demos/Exploratory_Analysis_Demo
```

```{toctree}
:hidden:
:caption: Code
:caption: Documentation

generated/code/modules
generated/model_properties_table.md
```

```{toctree}
:hidden:
:caption: Models
:caption: Resources

generated/model_properties_table.md
content/tutorials
content/citation
content/contributing
generated/demos/Exploratory_Analysis_Demo
```

```{toctree}
:hidden:
:caption: Development

content/development
content/contributing
Github <https://github.com/neelnanda-io/TransformerLens>
```
Loading