Merge branch 'main' into isaac/backtestingtutorial

langchain-ai · Dec 19, 2024 · 355a02f · 355a02f
2 parents 041c29d + f9ec6b7
commit 355a02f
Show file tree

Hide file tree

Showing 33 changed files with 2,579 additions and 788 deletions.
diff --git a/.prettierignore b/.prettierignore
@@ -1,4 +1,5 @@
 node_modules
 build
 .docusaurus
-docs/api
+docs/api
+docs/evaluation
diff --git a/docs/administration/pricing.mdx b/docs/administration/pricing.mdx
@@ -208,13 +208,13 @@ If you’ve consumed the monthly allotment of free traces in your account, you c
 
 Every user will have a unique personal account on the Developer plan. <b>We cannot upgrade a Developer account to the Plus or Enterprise plans.</b> If you’re interested in working as a team, create a separate LangSmith Organization on the Plus plan. This plan can upgraded to the Enterprise plan at a later date.
 
-### How will billing work?
+### How does billing work?
 
 <b>Seats</b>
 <br />
-Seats are billed monthly on the first of the month in the future will be
-pro-rated if additional seats are purchased in the middle of the month. Seats
-removed mid-month will not be credited.
+Seats are billed monthly on the first of the month. Additional seats purchased
+mid-month are pro-rated and billed within one day of the purchase. Seats removed
+mid-month will not be credited.
 <br />
 <br />
 <b>Traces</b>

diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx
diff --git a/docs/evaluation/concepts/static/dataset_concept.png b/docs/evaluation/concepts/static/dataset_concept.png
diff --git a/docs/evaluation/concepts/static/example_concept.png b/docs/evaluation/concepts/static/example_concept.png
diff --git a/docs/evaluation/concepts/static/langsmith_overview.png b/docs/evaluation/concepts/static/langsmith_overview.png
diff --git a/docs/evaluation/concepts/static/langsmith_summary.png b/docs/evaluation/concepts/static/langsmith_summary.png
diff --git a/docs/evaluation/concepts/static/offline.png b/docs/evaluation/concepts/static/offline.png
diff --git a/docs/evaluation/concepts/static/online.png b/docs/evaluation/concepts/static/online.png
diff --git a/docs/evaluation/how_to_guides/annotation_queues.mdx b/docs/evaluation/how_to_guides/annotation_queues.mdx
@@ -73,6 +73,9 @@ To assign runs to an annotation queue, either:
 
 3. [Set up an automation rule](../../../observability/how_to_guides/monitoring/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue.
 
+4. Select one or multiple experiments from the dataset page and click **Annotate**. From the resulting popup, you may either create a new queue or add the runs to an existing one:
+   ![](./static/annotate_experiment.png)
+
 :::tip
 
 It is often a very good idea to assign runs that have a certain user feedback score (eg thumbs up, thumbs down) from the application to an annotation queue. This way, you can identify and address issues that are causing user dissatisfaction.

diff --git a/docs/evaluation/how_to_guides/custom_evaluator.mdx b/docs/evaluation/how_to_guides/custom_evaluator.mdx
@@ -71,9 +71,9 @@ Custom evaluators are expected to return one of the following types:
 
 Python and JS/TS
 
-- `dict`: dicts of the form `{"score" | "value": ..., "name": ...}` allow you to customize the metric type ("score" for numerical and "value" for categorical) and metric name. This if useful if, for example, you want to log an integer as a categorical metric.
+- `dict`: dicts of the form `{"score" | "value": ..., "key": ...}` allow you to customize the metric type ("score" for numerical and "value" for categorical) and metric name. This if useful if, for example, you want to log an integer as a categorical metric.
 
-Currently Python only
+Python only
 
 - `int | float | bool`: this is interepreted as an continuous metric that can be averaged, sorted, etc. The function name is used as the name of the metric.
 - `str`: this is intepreted as a categorical metric. The function name is used as the name of the metric.

diff --git a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx
@@ -124,7 +124,7 @@ In the Python example below, we are pulling [this structured prompt](https://smi
           return scores
 
       evaluate(
-          ["experiment-1", "experiment-2"],  # Replace with the names/IDs of your experiments
+          ("experiment-1", "experiment-2"),  # Replace with the names/IDs of your experiments
           evaluators=[ranked_preference],
           randomize_order=True,
           max_concurrency=4,

diff --git a/docs/evaluation/how_to_guides/static/annotate_experiment.png b/docs/evaluation/how_to_guides/static/annotate_experiment.png
diff --git a/docs/evaluation/how_to_guides/static/view_experiment.gif b/docs/evaluation/how_to_guides/static/view_experiment.gif