diff --git a/docs/source/content/citation.md b/docs/source/content/citation.md index e38b700a6..111f5b61d 100644 --- a/docs/source/content/citation.md +++ b/docs/source/content/citation.md @@ -6,10 +6,8 @@ Please cite this library as: ```BibTeX @misc{nanda2022transformerlens, title = {TransformerLens}, - author = {Neel Nanda}, + author = {Neel Nanda and Joseph Bloom}, year = {2022}, howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}}, } ``` - -Also, if you're actually using this for your research, I'd love to chat! Reach out at neelnanda27@gmail.com diff --git a/docs/source/content/development.md b/docs/source/content/development.md deleted file mode 100644 index 086bfe825..000000000 --- a/docs/source/content/development.md +++ /dev/null @@ -1,28 +0,0 @@ -# Local Development - -## DevContainer - -For a one-click setup of your development environment, this project includes a [DevContainer](https://containers.dev/). It can be used locally with [VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or with [GitHub Codespaces](https://github.com/features/codespaces). - -## Manual Setup - -This project uses [Poetry](https://python-poetry.org/docs/#installation) for package management. Install as follows (this will also setup your virtual environment): - -```bash -poetry config virtualenvs.in-project true -poetry install --with dev -``` - -Optionally, if you want Jupyter Lab you can run `poetry run pip install jupyterlab` (to install in the same virtual environment), and then run with `poetry run jupyter lab`. - -Then the library can be imported as `import transformer_lens`. - -## Testing - -If adding a feature, please add unit tests for it to the tests folder, and check that it hasn't broken anything major using the existing tests (install pytest and run it in the root TransformerLens/ directory). - -To run tests, you can use the following command: - -```shell -poetry run pytest -v transformer_lens/tests -``` diff --git a/docs/source/content/gallery.md b/docs/source/content/gallery.md index d3f30262e..93f46aa45 100644 --- a/docs/source/content/gallery.md +++ b/docs/source/content/gallery.md @@ -1,6 +1,39 @@ # Gallery +Research done involving TransformerLens: + +- [Progress Measures for Grokking via Mechanistic + Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence + Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt +- [Finding Neurons in a Haystack: Case Studies with Sparse + Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine + Harvey, Dmitrii Troitskii, Dimitris Bertsimas +- [Towards Automated Circuit Discovery for Mechanistic + Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker, + Aengus Lynch, Stefan Heimersheim, AdriĆ  Garriga-Alonso +- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello) + by Neel Nanda +- [A circuit for Python docstrings in a 4-layer attention-only + transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only) + by Stefan Heimersheim and Jett Janiak +- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai, + Lawrence Chan, Neel Nanda +- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language + Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel + Nanda, Esben Kran, Ioannis Konstas, Fazl Barez +- [Eliciting Latent Predictions from Transformers with the Tuned + Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, + Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt + User contributed examples of the library being used in action: -* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane -* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). +- [Induction Heads Phase Change + Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): + A partial replication of [In-Context Learning and Induction + Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) + from Connor Kissane +- [Decision Transformer + Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of + scripts for training decision transformers which uses transformer lens to view intermediate + activations, perform attribution and ablations. A write up of the initial work can be found + [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). \ No newline at end of file diff --git a/docs/source/content/getting_started_mech_interp.md b/docs/source/content/getting_started_mech_interp.md new file mode 100644 index 000000000..8e8de4d45 --- /dev/null +++ b/docs/source/content/getting_started_mech_interp.md @@ -0,0 +1,37 @@ +# Getting Started in Mechanistic Interpretability + +Mechanistic interpretability is a very young and small field, and there are a _lot_ of open +problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if +you would like to help, please try working on one! The standard answer to "why has no one done this +yet" is just that there aren't enough people! Key resources: + +- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started) +- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from + Callum McDougall. A comprehensive practical introduction to mech interp, written in + TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable + tutorials: + - [Coding GPT-2 from + scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with + accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial) + [2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers + - [Introduction to Mech Interp and + TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An + introduction to TransformerLens and mech interp via studying induction heads. Covers the + foundational concepts of the library + - [Indirect Object + Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification): + a replication of interpretability in the wild, that covers standard techniques in mech interp + such as [direct logit + attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ), + [activation patching and path + patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching) +- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list) +- [200 Concrete Open Problems in Mechanistic + Interpretability](https://neelnanda.io/concrete-open-problems) +- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look + up all the jargon and unfamiliar terms you're going to come across! +- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range + of mech interp video content, including [paper + walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1), + and [walkthroughs of doing + research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T) \ No newline at end of file diff --git a/docs/source/index.md b/docs/source/index.md index 4007f6c77..f478501e9 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -23,37 +23,32 @@ The core features were heavily inspired by the interface to [Anthropic's excelle :caption: Introduction content/getting_started +content/getting_started_mech_interp content/gallery ``` ```{toctree} :hidden: -:caption: Resources - -content/tutorials -content/citation -content/contributing -generated/demos/Exploratory_Analysis_Demo -``` - -```{toctree} -:hidden: -:caption: Code +:caption: Documentation generated/code/modules +generated/model_properties_table.md ``` ```{toctree} :hidden: -:caption: Models +:caption: Resources -generated/model_properties_table.md +content/tutorials +content/citation +content/contributing +generated/demos/Exploratory_Analysis_Demo ``` ```{toctree} :hidden: :caption: Development -content/development +content/contributing Github ```