lark examples

guidance-ai · Nov 29, 2024 · 8634606 · 8634606
1 parent 9ffbb7b
commit 8634606
Showing 1 changed file with 53 additions and 0 deletions.
diff --git a/parser/src/lark/README.md b/parser/src/lark/README.md
@@ -29,7 +29,60 @@ Following are currently not supported:
 - templates
 - imports (other than built-in `%import common`)
 - regexes use Rust `regex` crate [syntax](https://docs.rs/regex/latest/regex/#syntax), not Python's `re` (though they are similar)
+- certain string syntax, see [issue](https://github.com/microsoft/llguidance/issues/54)
 
 Following features of llguidance are currently not exposed in Lark syntax:
 
 - per-lexeme contextual and lazy flags
+
+## Examples
+
+### Llama JSON tool calling
+
+Here, we restrict the output to either normal text response,
+or a tool call to either Brave or Wolfram Alpha.
+
+```lark
+start: normal_text | brave | wolfram
+normal_text: /(.|\n)*/
+brave: <|python_tag|> "brave_search.call(query=" JSON_STRING ")" <|eom_id|>
+wolfram: <|python_tag|> "wolfram_alpha.call(query=" JSON_STRING ")" <|eom_id|>
+JSON_STRING_CHAR: /(\\([\"\\\/bfnrt]|u[a-fA-F0-9]{4})|[^\"\\\x00-\x1F\x7F])/
+JSON_STRING: "\"" JSON_STRING_CHAR* "\""
+```
+
+Note that just as in lark uppercase identifiers define grammar lexemes
+(also often called tokens) - they can't be recursive
+(they are compiled to regular expressions).
+This has performance implications, in particular you should **avoid short lexemes**.
+If the grammar used `json_string` not `JSON_STRING`,
+then each `json_string` would consists of lexeme `"`, followed
+by any number of single-character lexemes, followed by lexeme `"`.
+Such grammar would be very slow to run.
+With upper-case `JSON_STRING`, the whole string is a lexeme.
+
+BTW, in this case you may want to replace the JSON string
+with Python string, depending on how the model was trained.
+
+You can also use Lark-like syntax to combine JSON schemas with regular output.
+In that case, you pass the JSON schemas as additional grammars, with
+the lark grammar being the top-level one.
+
+```lark
+start: normal_text | fun_call
+// @fun0, @fun1 refer to other sub-grammars, see below
+fun_call: <|python_tag|> ( @fun0 | @fun1 ) <|eom_id|>
+normal_text: /(.|\n)*/
+```
+
+```json
+{
+  "grammars": [
+    {
+      "lark_grammar": "...the lark above...",
+    },
+    {"name": "fun0", "json_schema": { ... }},
+    {"name": "fun1", "json_schema": { ... }}
+  ]
+}
+```