Skip to content

Commit

Permalink
add off-the-shelf and subset
Browse files Browse the repository at this point in the history
  • Loading branch information
agola11 committed May 2, 2024
1 parent eb60d66 commit acf7191
Show file tree
Hide file tree
Showing 5 changed files with 183 additions and 209 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -325,3 +325,19 @@ const examples = await client.listExamples({exampleIds: exampleIds});`),
]}
groupId="client-language"
/>

### List examples by metadata

You can also filter examples by metadata. Below is an example querying for examples with a specific metadata key-value pair.

<CodeTabs
tabs={[
PythonBlock(
`examples = client.list_examples(dataset_name=dataset_name, metadata={"desired_key": "desired_value"})`
),
TypeScriptBlock(
`const examples = await client.listExamples({datasetName: datasetName, metadata: {desiredKey: "desiredValue"}});`
),
]}
groupId="client-language"
/>
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ import {
Before diving into this content, it might be helpful to read the following:

- [Conceptual guide on evaluation](../../concepts/evaluation)
- [Reference guide on evaluation]
- [How-to guide on managing datasets](../datasets/manage_datasets_in_application)
- [How-to guide on managing datasets programmatically](../datasets/manage_datasets_programmatically)

Expand Down Expand Up @@ -276,8 +275,10 @@ To view a more advanced example that traverses the `root_run` object, please ref
## Evaluate on a particular version of a dataset

:::tip Recommended Reading

Before diving into this content, it might be helpful to read the [guide on versioning datasets](../datasets/version_datasets).
Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples)
Additionally, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).

:::

You can take advantage of the fact that `evaluate` allows passing in an iterable of examples to evaluate on a particular version of a dataset.
Expand All @@ -296,6 +297,27 @@ results = evaluate(

## Evaluate on a subset of a dataset

:::tip Recommended Reading

Before diving into this content, it might be helpful to read the [guide on fetching examples](../datasets/manage_datasets_programmatically#fetch-examples).

:::

You can use the `list_examples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples.

One common workflow is to fetch examples that have a certain metadata key-value pair.

```python
from langsmith.evaluation import evaluate

results = evaluate(
lambda inputs: label_text(inputs["text"]),
data=client.list_examples(dataset_name=dataset_name, metadata={"desired_key": "desired_value"}),
evaluators=[correct_label],
experiment_prefix="Toxic Queries",
)
```

## Use a summary evaluator

Some metrics can only be defined on the entire experiment level as opposed to the individual runs of the experiment.
Expand Down
Loading

0 comments on commit acf7191

Please sign in to comment.