v1.12

oobabooga released this 25 Jul 15:19

· 201 commits to main since this release

dd97a83

Backend updates

Transformers: bump to 4.43 (adds Llama 3.1 support).
ExLlamaV2: bump to 0.1.8 (adds Llama 3.1 support).
AutoAWQ: bump to 0.2.6 (adds Llama 3.1 support).
- Remove AutoAWQ as a standalone loader. I found that hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 works better when loaded directly through Transformers, and that's what the README recommends. AutoAWQ is still used in the background.

UI updates

Make text between quote characters colored in chat and chat-instruct modes.
Prevent LaTeX from being rendered for inline "$", as that caused problems for phrases like "apples cost $1, oranges cost $2".
Make the markdown cache infinite and clear it when switching to another chat. This cache exists because the markdown conversion is CPU-intensive. By making it infinite, messages in a full 128k context will be cached, making the UI more responsive for long conversations.

Bug fixes

Fix a race condition that caused the default character to not be loaded correctly on startup.
Fix Linux shebangs (#6110). Thanks @LuNeder.

Other changes

Make the Google Colab notebook use the one-click installer instead of its own Python environment for better stability.
Disable flash-attention on Google Colab by default, as its GPU models do not support it.

Contributors

LuNeder

Assets 2