From 6631cf84fe124267171428949f30334540e35411 Mon Sep 17 00:00:00 2001 From: Alan Cooney <41682961+alan-cooney@users.noreply.github.com> Date: Fri, 20 Oct 2023 20:47:52 +0800 Subject: [PATCH] Sync docs with readme --- docs/source/content/citation.md | 4 +- docs/source/content/contributing.md | 7 +++- docs/source/content/development.md | 28 -------------- docs/source/content/gallery.md | 37 ++++++++++++++++++- .../content/getting_started_mech_interp.md | 37 +++++++++++++++++++ docs/source/index.md | 20 ++++------ 6 files changed, 85 insertions(+), 48 deletions(-) delete mode 100644 docs/source/content/development.md create mode 100644 docs/source/content/getting_started_mech_interp.md diff --git a/docs/source/content/citation.md b/docs/source/content/citation.md index e38b700a6..111f5b61d 100644 --- a/docs/source/content/citation.md +++ b/docs/source/content/citation.md @@ -6,10 +6,8 @@ Please cite this library as: ```BibTeX @misc{nanda2022transformerlens, title = {TransformerLens}, - author = {Neel Nanda}, + author = {Neel Nanda and Joseph Bloom}, year = {2022}, howpublished = {\url{https://github.com/neelnanda-io/TransformerLens}}, } ``` - -Also, if you're actually using this for your research, I'd love to chat! Reach out at neelnanda27@gmail.com diff --git a/docs/source/content/contributing.md b/docs/source/content/contributing.md index b7f1c1fb9..0f5803f36 100644 --- a/docs/source/content/contributing.md +++ b/docs/source/content/contributing.md @@ -21,11 +21,14 @@ poetry install --with dev,docs,jupyter ## Testing -If adding a feature, please add unit tests for it. +If adding a feature, please add unit tests for it. If you need a model, please use one of the ones +that are cached by GitHub Actions (so that it runs quickly on the CD). These are `gpt2`, +`attn-only-1l`, `attn-only-2l`, `attn-only-3l`, `attn-only-4l`, `tiny-stories-1M`. Note `gpt2` is +quite slow (as we only have CPU actions) so the smaller models like `attn-only-1l` and +`tiny-stories-1M` are preferred if possible. ### Running the tests -- All tests via `make test` - Unit tests only via `make unit-test` - Acceptance tests only via `make acceptance-test` - Docstring tests only via `make docstring-test` diff --git a/docs/source/content/development.md b/docs/source/content/development.md deleted file mode 100644 index 086bfe825..000000000 --- a/docs/source/content/development.md +++ /dev/null @@ -1,28 +0,0 @@ -# Local Development - -## DevContainer - -For a one-click setup of your development environment, this project includes a [DevContainer](https://containers.dev/). It can be used locally with [VS Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) or with [GitHub Codespaces](https://github.com/features/codespaces). - -## Manual Setup - -This project uses [Poetry](https://python-poetry.org/docs/#installation) for package management. Install as follows (this will also setup your virtual environment): - -```bash -poetry config virtualenvs.in-project true -poetry install --with dev -``` - -Optionally, if you want Jupyter Lab you can run `poetry run pip install jupyterlab` (to install in the same virtual environment), and then run with `poetry run jupyter lab`. - -Then the library can be imported as `import transformer_lens`. - -## Testing - -If adding a feature, please add unit tests for it to the tests folder, and check that it hasn't broken anything major using the existing tests (install pytest and run it in the root TransformerLens/ directory). - -To run tests, you can use the following command: - -```shell -poetry run pytest -v transformer_lens/tests -``` diff --git a/docs/source/content/gallery.md b/docs/source/content/gallery.md index d3f30262e..93f46aa45 100644 --- a/docs/source/content/gallery.md +++ b/docs/source/content/gallery.md @@ -1,6 +1,39 @@ # Gallery +Research done involving TransformerLens: + +- [Progress Measures for Grokking via Mechanistic + Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence + Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt +- [Finding Neurons in a Haystack: Case Studies with Sparse + Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine + Harvey, Dmitrii Troitskii, Dimitris Bertsimas +- [Towards Automated Circuit Discovery for Mechanistic + Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker, + Aengus Lynch, Stefan Heimersheim, AdriĆ  Garriga-Alonso +- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello) + by Neel Nanda +- [A circuit for Python docstrings in a 4-layer attention-only + transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only) + by Stefan Heimersheim and Jett Janiak +- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai, + Lawrence Chan, Neel Nanda +- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language + Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel + Nanda, Esben Kran, Ioannis Konstas, Fazl Barez +- [Eliciting Latent Predictions from Transformers with the Tuned + Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, + Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt + User contributed examples of the library being used in action: -* [Induction Heads Phase Change Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): A partial replication of [In-Context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) from Connor Kissane -* [Decision Transformer Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of scripts for training decision transformers which uses transformer lens to view intermediate activations, perform attribution and ablations. A write up of the initial work can be found [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). +- [Induction Heads Phase Change + Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb): + A partial replication of [In-Context Learning and Induction + Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) + from Connor Kissane +- [Decision Transformer + Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of + scripts for training decision transformers which uses transformer lens to view intermediate + activations, perform attribution and ablations. A write up of the initial work can be found + [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability). \ No newline at end of file diff --git a/docs/source/content/getting_started_mech_interp.md b/docs/source/content/getting_started_mech_interp.md new file mode 100644 index 000000000..8e8de4d45 --- /dev/null +++ b/docs/source/content/getting_started_mech_interp.md @@ -0,0 +1,37 @@ +# Getting Started in Mechanistic Interpretability + +Mechanistic interpretability is a very young and small field, and there are a _lot_ of open +problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if +you would like to help, please try working on one! The standard answer to "why has no one done this +yet" is just that there aren't enough people! Key resources: + +- [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started) +- [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from + Callum McDougall. A comprehensive practical introduction to mech interp, written in + TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable + tutorials: + - [Coding GPT-2 from + scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with + accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial) + [2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers + - [Introduction to Mech Interp and + TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An + introduction to TransformerLens and mech interp via studying induction heads. Covers the + foundational concepts of the library + - [Indirect Object + Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification): + a replication of interpretability in the wild, that covers standard techniques in mech interp + such as [direct logit + attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ), + [activation patching and path + patching](https://www.lesswrong.com/posts/xh85KbTFhbCz7taD4/how-to-think-about-activation-patching) +- [Mech Interp Paper Reading List](https://neelnanda.io/paper-list) +- [200 Concrete Open Problems in Mechanistic + Interpretability](https://neelnanda.io/concrete-open-problems) +- [A Comprehensive Mechanistic Interpretability Explainer](https://neelnanda.io/glossary): To look + up all the jargon and unfamiliar terms you're going to come across! +- [Neel Nanda's Youtube channel](https://www.youtube.com/channel/UCBMJ0D-omcRay8dh4QT0doQ): A range + of mech interp video content, including [paper + walkthroughs](https://www.youtube.com/watch?v=KV5gbOmHbjU&list=PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&index=1), + and [walkthroughs of doing + research](https://www.youtube.com/watch?v=yo4QvDn-vsU&list=PL7m7hLIqA0hr4dVOgjNwP2zjQGVHKeB7T) \ No newline at end of file diff --git a/docs/source/index.md b/docs/source/index.md index 1430490e0..4059d2cfa 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -28,31 +28,25 @@ content/gallery ```{toctree} :hidden: -:caption: Resources - -content/tutorials -content/citation -content/contributing -``` - -```{toctree} -:hidden: -:caption: Code +:caption: Documentation generated/code/modules +generated/model_properties_table.md ``` ```{toctree} :hidden: -:caption: Models +:caption: Resources -generated/model_properties_table.md +content/getting_started_mech_interp +content/tutorials +content/citation ``` ```{toctree} :hidden: :caption: Development -content/development +content/contributing Github ```