-
Notifications
You must be signed in to change notification settings - Fork 319
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploying to gh-pages from @ 3caad41 🚀
- Loading branch information
1 parent
3d7d739
commit fec10be
Showing
47 changed files
with
2,125 additions
and
2,126 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified
BIN
+0 Bytes
(100%)
.doctrees/generated/demos/Exploratory_Analysis_Demo.doctree
Binary file not shown.
Binary file not shown.
1,592 changes: 796 additions & 796 deletions
1,592
.doctrees/nbsphinx/generated/demos/Exploratory_Analysis_Demo.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,10 +6,8 @@ Please cite this library as: | |
```BibTeX | ||
@misc{nanda2022transformerlens, | ||
title = {TransformerLens}, | ||
author = {Neel Nanda}, | ||
author = {Neel Nanda and Joseph Bloom}, | ||
year = {2022}, | ||
howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}}, | ||
} | ||
``` | ||
|
||
Also, if you're actually using this for your research, I'd love to chat! Reach out at [email protected] |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,39 @@ | ||
# Gallery | ||
|
||
Research done involving TransformerLens: | ||
|
||
- [Progress Measures for Grokking via Mechanistic | ||
Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence | ||
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt | ||
- [Finding Neurons in a Haystack: Case Studies with Sparse | ||
Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine | ||
Harvey, Dmitrii Troitskii, Dimitris Bertsimas | ||
- [Towards Automated Circuit Discovery for Mechanistic | ||
Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker, | ||
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso | ||
- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello) | ||
by Neel Nanda | ||
- [A circuit for Python docstrings in a 4-layer attention-only | ||
transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only) | ||
by Stefan Heimersheim and Jett Janiak | ||
- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai, | ||
Lawrence Chan, Neel Nanda | ||
- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language | ||
Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel | ||
Nanda, Esben Kran, Ioannis Konstas, Fazl Barez | ||
- [Eliciting Latent Predictions from Transformers with the Tuned | ||
Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, | ||
Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt | ||
|
||
User contributed examples of the library being used in action: | ||
|
||
* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane | ||
* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). | ||
- [Induction Heads Phase Change | ||
Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): | ||
A partial replication of [In-Context Learning and Induction | ||
Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) | ||
from Connor Kissane | ||
- [Decision Transformer | ||
Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of | ||
scripts for training decision transformers which uses transformer lens to view intermediate | ||
activations, perform attribution and ablations. A write up of the initial work can be found | ||
[here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Getting Started in Mechanistic Interpretability | ||
|
||
Mechanistic interpretability is a very young and small field, and there are a _lot_ of open | ||
problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if | ||
you would like to help, please try working on one! The standard answer to "why has no one done this | ||
yet" is just that there aren't enough people! Key resources: | ||
|
||
- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started) | ||
- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from | ||
Callum McDougall. A comprehensive practical introduction to mech interp, written in | ||
TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable | ||
tutorials: | ||
- [Coding GPT-2 from | ||
scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with | ||
accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial) | ||
[2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers | ||
- [Introduction to Mech Interp and | ||
TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An | ||
introduction to TransformerLens and mech interp via studying induction heads. Covers the | ||
foundational concepts of the library | ||
- [Indirect Object | ||
Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification): | ||
a replication of interpretability in the wild, that covers standard techniques in mech interp | ||
such as [direct logit | ||
attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ), | ||
[activation patching and path | ||
patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching) | ||
- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list) | ||
- [200 Concrete Open Problems in Mechanistic | ||
Interpretability](https://neelnanda.io/concrete-open-problems) | ||
- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look | ||
up all the jargon and unfamiliar terms you're going to come across! | ||
- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range | ||
of mech interp video content, including [paper | ||
walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1), | ||
and [walkthroughs of doing | ||
research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.