Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets and evals guides #200

Merged
merged 28 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"editor.trimAutoWhitespace": false,
"files.trimTrailingWhitespaceInRegexAndStrings": false
}
2 changes: 1 addition & 1 deletion docs/evaluation/faq/custom-evaluators.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ With function calling, it has become easier than ever to generate feedback metri
Below is an example (in this case using OpenAI's tool calling functionality) to evaluate RAG app faithfulness.

````python
iimport json
import json
from typing import List

import openai
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"@docusaurus/theme-mermaid": "2.4.3",
"@emotion/react": "^11.11.0",
"@emotion/styled": "^11.11.0",
"dedent": "^1.5.3",
"@mdx-js/react": "^1.6.22",
"@mui/icons-material": "^5.11.16",
"@mui/joy": "^5.0.0-alpha.81",
Expand Down
18 changes: 18 additions & 0 deletions src/components/InstructionsWithCode.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import CodeBlock from "@theme/CodeBlock";
import { marked } from "marked";
import DOMPurify from "isomorphic-dompurify";
import prettier from "prettier";
import dedent from "dedent";
import parserTypeScript from "prettier/parser-typescript";

export function LangChainPyBlock(content) {
Expand Down Expand Up @@ -99,6 +100,7 @@ export function CodeTabs({ tabs, groupId }) {
<TabItem key={key} value={tab.value} label={tab.label}>
{tab.caption && (
<div
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{
__html: DOMPurify.sanitize(marked.parse(tab.caption)),
}}
Expand All @@ -116,3 +118,19 @@ export function CodeTabs({ tabs, groupId }) {
</Tabs>
);
}

export const typescript = (strings, ...values) => {
let result = "";
strings.forEach((string, i) => {
result += string + String(values[i] ?? "");
});
return TypeScriptBlock(dedent(result));
};

export const python = (strings, ...values) => {
let result = "";
strings.forEach((string, i) => {
result += string + String(values[i] ?? "");
});
return PythonBlock(dedent(result));
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
sidebar_position: 1
---

# Manage datasets in the application

:::tip Recommended Reading
Before diving into this content, it might be helpful to read the following:

- [Concepts guide on evaluation and datasets](../../concepts/evaluation#datasets-and-examples)

:::

The easiest way to interact with datasets is directly in the LangSmith app. Here, you can create and edit datasets and example.

## Create a new dataset and add examples manually

To get started, you can create a new datasets by heading to the "Datasets and Testing" section of the application and clicking on "+ New Dataset".

![](../static/new_dataset.png)

Then, enter the relevant dataset details, including a name, optional description, and dataset type. Please see the [concepts](../../concepts/evaluation#datasets-and-examples) for more information on dataset types. For most flexibility, the key-value dataset type is recommended.

![](../static/enter_dataset_details.png)

You can then add examples to the dataset by clicking on "Add Example". Here, you can enter the input and output as JSON objects.

![](../static/add_manual_example.png)

## Add inputs and outputs from traces to datasets

We typically construct datasets over time by collecting representative examples from debugging or other runs. To do this, we first filter the traces to find the ones we want to add to the dataset. Then we add the inputs and outputs from these traces to the dataset.

You can do this from any 'run' details page by clicking the 'Add to Dataset' button in the top right-hand corner.

:::tip
An extremely powerful technique to build datasets is to drill-down into the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset.
For tips on how to filter traces, see the [filtering traces] guide.
:::

![Add to Dataset](../static/add_to_dataset.png)

From there, we select the dataset to organize it in and update the ground truth output values if necessary.

![Modify example](../static/modify_example.png)

## Upload a CSV file to create a dataset

The easiest way to create a dataset from your own data is by clicking the 'upload a CSV dataset' button on the home page or in the top right-hand corner of the 'Datasets & Testing' page.

![Upload CSV](../static/create_dataset_csv.png)

Select a name and description for the dataset, and then confirm that the inferred input and output columns are correct.

![Confirm Columns](../static/select_columns.png)

## Export a dataset

You can export your LangSmith dataset to CSV or OpenAI evals format directly from the web application.

To do so, click "Export Dataset" from the homepage.
To do so, select a dataset, click on "Examples", and then click the "Export Dataset" button at the top of the examples table.

![Export Dataset Button](../static/export-dataset-button.png)

This will open a modal where you can select the format you want to export to.

![Export Dataset Modal](../static/export-dataset-modal.png)
Loading
Loading