Releases: AlignmentResearch/tuned-lens
v0.2.0
Breaking changes
- The from
from_model_and_pretrained
interface has been updated to take remove the slice option this has been moved to its own methodslice_sequence
method.
New features
-
Integration with transformer lens #103
- This is probably the biggest new feature. We now support directly producing a
PredictionTrajectory
from a lens and anActivationCache
. - This means that you can visualize the effects of interventions made using the fantastic
TransformerLens
library using the full set of tools that come with thetuned-lens
project. - There is a tutorial discussing this integration here
- This is probably the biggest new feature. We now support directly producing a
-
Rank visualization #105
- Like in the original logit lens blog post we now support easily visualizing the rank of the target token in the prediction distribution. See
Full Changelog: v0.1.1...v0.2.0
v0.1.1
Most of the changes in this release focused on improving the training and evaluation code. If you are mainly using pretrained lenses, this should not affect you too much.
Changes
- The evaluation sub-command now produces
json
files, evaluating for a certain number of tokens rather than steps, and the command line interface has been improved. (#92) - Training now supports check pointing to allow for saving lenses during training and resuming training if it is interrupted (#95).
- Training can now be done in 8 bits though this does not currently combine with fspd (#88, #94)
Bug Fixes
- Slow tokenizers can now work correctly when installing with the
[slow_tokenizers]
optional dependency (#91). - Lens hashes that were broken by a previous change have been removed and should no longer produce warnings (#99, https://huggingface.co/spaces/AlignmentResearch/tuned-lens/discussions/39)
v0.1.0
This release primarily focused on removing technical debt, refactoring the repository, and raising the engineering standards in the codebase. While there are some new features, particularly in the plotting code, most of the work focused on making the codebase maintainable and easy to continue building on.
Changes
- A large amount of code was removed in this update #80. Some of this code is relevant to replicating a few of the experiments in the archived version of the arXiv paper. For those planning to replicate the prompt injection experiments, the abnormality detection code can still be found in version
0.0.5
of the codebase. - The Tuned Lens class itself has also been substantially simplified by extracting the unembed operation into its own class, namely the
Unembed
class #55.- The largest breaking change for downstream users is the new interface for loading pretrained lenses. See the documentation here
- The plotting code was completely refactored to make it more versatile and easier to build on #63. There is a tutorial for these new features in the docs.
- The training code was completely rewritten to make it modular, making use of shared
ingredients
and the downstream loop was removed. For reference on how to use the new training interface, see the tutorial here - The
model_surgery
module no longer uses heuristics to locate where certain model components are #69 - The data processing code was also streamlined #78
- In addition, the
Decoder
has been simplified renamed to theUnembed
class #71 #81, #55.
Contributors
While the majority of this update was written by @levmckinney, a huge thank you to @norabelrose and @alexmlong for their contributions and @AdamGleave, @rhaps0dy, @taufeeque9 for providing code reviews.
Full Changelog: v0.0.5...v0.1.0
v0.0.5
This release will likely be the final release before 0.1.x. There are some major refactors that are about to be merged. This release mostly consists of removing a lot of dead code and allowing you to specify a revision for tokenizers in the training scripts and lenses in TunedLens.load
.
v0.0.3
First release!