add off-the-shelf and subset

langchain-ai · May 2, 2024 · acf7191 · acf7191
1 parent eb60d66
commit acf7191
Show file tree

Hide file tree

Showing 5 changed files with 183 additions and 209 deletions.
diff --git a/...ed_docs/version-2.0/how_to_guides/datasets/manage_datasets_programmatically.mdx b/...ed_docs/version-2.0/how_to_guides/datasets/manage_datasets_programmatically.mdx
@@ -325,3 +325,19 @@ const examples = await client.listExamples({exampleIds: exampleIds});`),
   ]}
   groupId="client-language"
 />
+
+### List examples by metadata
+
+You can also filter examples by metadata. Below is an example querying for examples with a specific metadata key-value pair.
+
+<CodeTabs
+  tabs={[
+    PythonBlock(
+      `examples = client.list_examples(dataset_name=dataset_name, metadata={"desired_key": "desired_value"})`
+    ),
+    TypeScriptBlock(
+      `const examples = await client.listExamples({datasetName: datasetName, metadata: {desiredKey: "desiredValue"}});`
+    ),
+  ]}
+  groupId="client-language"
+/>
diff --git a/versioned_docs/version-2.0/how_to_guides/evaluation/evaluate_llm_application.mdx b/versioned_docs/version-2.0/how_to_guides/evaluation/evaluate_llm_application.mdx
@@ -14,7 +14,6 @@ import {
 Before diving into this content, it might be helpful to read the following:
 
 - [Conceptual guide on evaluation](../../concepts/evaluation)
-- [Reference guide on evaluation]
 - [How-to guide on managing datasets](../datasets/manage_datasets_in_application)
 - [How-to guide on managing datasets programmatically](../datasets/manage_datasets_programmatically)
 
@@ -276,8 +275,10 @@ To view a more advanced example that traverses the `root_run` object, please ref
 ## Evaluate on a particular version of a dataset
 
 :::tip Recommended Reading
+
 Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets).
-Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples)
+Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
+
 :::
 
 You can take advantage of the fact that `evaluate` allows passing in an iterable of examples to evaluate on a particular version of a dataset.
@@ -296,6 +297,27 @@ results = evaluate(
 
 ## Evaluate on a subset of a dataset
 
+:::tip Recommended Reading
+
+Before diving into this content, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).
+
+:::
+
+You can use the `list_examples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples.
+
+One common workflow is to fetch examples that have a certain metadata key-value pair.
+
+```python
+from langsmith.evaluation import evaluate
+
+results = evaluate(
+    lambda inputs: label_text(inputs["text"]),
+    data=client.list_examples(dataset_name=dataset_name, metadata={"desired_key": "desired_value"}),
+    evaluators=[correct_label],
+    experiment_prefix="Toxic Queries",
+)
+```
+
 ## Use a summary evaluator
 
 Some metrics can only be defined on the entire experiment level as opposed to the individual runs of the experiment.