Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add how-to guide for regression testing #201

Merged
merged 2 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/evaluation/faq/experiments-app.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Run Experiments in Browser (no code)
sidebar_position: 7
sidebar_position: 8
---

# How to run experiments in the prompt playground (no code)
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/manage-datasets.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Manage Datasets
sidebar_position: 4
sidebar_position: 5
---

import {
Expand Down
40 changes: 40 additions & 0 deletions docs/evaluation/faq/regression-testing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
sidebar_label: Regression Testing
sidebar_position: 3
---

# Regression Testing

When evaluating LLM applications, it is important to be able to track how your system performs over time. In this guide, we will show you how to use LangSmith's comparison view in
order to track regressions in your application, and drill down to inspect the specific runs that improved/regressed over time.

## Overview

In the LangSmith comparison view, runs that _regressed_ on your specified feedback key against your baseline experiment will be highlighted in red, while runs that _improved_
will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and and how many did worse than your baseline experiment.

![Regressions](../static/regression_view.png)

## Baseline Experiment

In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can
change it from the dropdown at the top of the page.

![Baseline](../static/select_baseline.png)

## Select Feedback Key

You will also want to select the feedback key on which you would like focus. This can be selected via another dropdown at the top. Again, one will be assigned by
default, but you can adjust as needed.

![Feedback](../static/select_feedback.png)

## Filter to Regressions or Improvements

Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.

![Regressions Filter](../static/filter_to_regressions.png)

## Try it out

To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.
2 changes: 1 addition & 1 deletion docs/evaluation/faq/synthetic-data.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Synthetic Data for Evaluation
sidebar_position: 8
sidebar_position: 9
---

# Synthetic Data for Evaluation
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/unit-testing.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Unit Test
sidebar_position: 3
sidebar_position: 4
---

# Unit Tests
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/version-datasets.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Version Datasets
sidebar_position: 5
sidebar_position: 6
---

# How to version datasets
Expand Down
Binary file added docs/evaluation/static/filter_to_regressions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/regression_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/select_baseline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/select_feedback.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading