diff --git a/README.md b/README.md index 5bcc50dd..2695c6b1 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,54 @@ # Low-level Guidance (llguidance) -This controller implements a context-free grammar parser with Earley's algorithm -on top of a lexer which uses [derivatives of regular expressions](https://github.com/microsoft/derivre). - -It's to be used by next-generation [Guidance](https://github.com/guidance-ai/guidance) grammars. -See how it works in [plan.md](./plan.md). +This library implements constrained decoding (also called constrained sampling or +structured outputs) for Large Langauge Models (LLMs). +It can enforce arbitrary context-free grammar on the output of LLM +and is fast (on the order of 1ms of CPU time per token +(for 100k tokenizer) with negligible startup costs). + +Following grammar formats are supported: +- `llguidance` - [internal (JSON-based) format](./parser/src/api.rs) +- regular expressions (following Rust regex crate [syntax](https://docs.rs/regex/latest/regex/#syntax)) +- a large subset of JSON schemas +- context-free grammars in (a [subset](./parser/src/lark/README.md) of) [Lark](https://github.com/lark-parser/lark) format + +The internal format is most powerful and can be generated by the following libraries: +- [Guidance](https://github.com/guidance-ai/guidance) (Python) +- [guidance.ts](https://github.com/mmoskal/guidance-ts) (TypeScript) +- hopefully more to come! -This is now available in `main` branch of Guidance. -Guidance PR: https://github.com/guidance-ai/guidance/pull/951 +The library can be used from: +- [Rust](./parser/README.md), [sample](./sample_parser/src/sample_parser.rs) +- [C and C++](./parser/llguidance.h), [sample](./c_sample/c_sample.cpp) +- [Python](./python/llguidance/_lib.pyi) + +The library is currently integrated in: +- [Guidance](https://github.com/guidance-ai/guidance) - library for interacting with LLMs; + uses either llama.cpp or HF Tranformers +- [LLGTRT](https://github.com/guidance-ai/llgtrt) - OpenAI-compatible REST server using NVIDIA's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) + +The integration is ongoing in: +- onnxruntime-genai - [draft PR](https://github.com/microsoft/onnxruntime-genai/pull/1038) +- mistral.rs - [preliminary PR](https://github.com/EricLBuehler/mistral.rs/pull/899) +- llama.cpp - [branch](https://github.com/mmoskal/llama.cpp/tree/llg); + note that llama.cpp is fully integrated in Guidance above + via Python bindings + +Given a context-free grammar, a tokenizer, and prefix of tokens, +llguidance computes a token mask (set of tokens from the tokenizer) +that when added to current prefix of token can lead to a valid string in +the language of the grammar. +Computing a mask takes on the order of 1ms of single-core CPU time +for a tokenizer with 100k tokens. +While this depends on the exact grammar, it holds eg. for grammars resulting from JSON schemas. +There is also no significant startup cost. + +The library implements a context-free grammar parser with Earley's algorithm +on top of a lexer which uses [derivatives of regular expressions](https://github.com/microsoft/derivre). Grammars are normally [JSON-serialized](./parser/src/api.rs). The following libraries produce llguidance grammars: -- [guidance](https://github.com/guidance-ai/guidance) (Python) -- [guidance.ts](https://github.com/mmoskal/guidance-ts) (TypeScript) -- hopefully more to come! - ## Building - [install rust](https://www.rust-lang.org/tools/install); 1.75 or later