Add public evals #74

ankrgyl · 2024-07-06T23:46:06Z

Add support for an eval suite based on COQA, eval the evaluators, and switch to gpt-4o as the default.

github-actions · 2024-07-06T23:47:23Z

Braintrust eval report

Autoevals (evals-1720650947)

Score	Average	Improvements	Regressions
NumericDiff	74.5% (+0pp)	-	-

ankrgyl · 2024-07-07T00:39:25Z

Evals show that gpt-4o beats gpt-3.5. I don't really trust the RAGAS metrics...

github-actions · 2024-07-16T17:41:39Z

Braintrust eval report

Autoevals (main-1721151701)

Score	Average	Improvements	Regressions
NumericDiff	75.3%	-	-
Duration	3.93s	-	-
Estimated_cost	0$	-	-

ankrgyl added 5 commits July 6, 2024 15:55

WIP

ff92ae6

Snapshot

7127cfb

Switch to gpt-4o

5549488

Add action

970fd1f

Add build stpe

8b9ff77

ankrgyl added 6 commits July 6, 2024 16:48

Name the dir "datasets"

8295fa3

Fix js?

11186d9

Consolidate RAGAS

b5ab460

Add more

27ece8f

Add closed qa

10cf636

Use gpt-3.5-turbo

08cb203

ankrgyl added 8 commits July 6, 2024 17:45

Fix a couple things

f6b6266

Fix test

8f66b19

Remove score checks. We use evals now!

d6d9a1c

Remove

ede8d6c

Merge branch 'main' into evals

ba83a89

Remove flaky metadata check

612a653

Merge branch 'main' into evals

648d6a9

Bump

d7410bf

ankrgyl requested review from manugoyal, aphinx and tara-nagar July 16, 2024 06:44

ankrgyl mentioned this pull request Jul 16, 2024

Add cookbook for chat assistant evals braintrustdata/braintrust-cookbook#21

Merged

tara-nagar approved these changes Jul 16, 2024

View reviewed changes

ankrgyl merged commit 0b7fcb8 into main Jul 16, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add public evals #74

Add public evals #74

ankrgyl commented Jul 6, 2024 •

edited

Loading

github-actions bot commented Jul 6, 2024 •

edited

Loading

ankrgyl commented Jul 7, 2024

github-actions bot commented Jul 16, 2024 •

edited

Loading

Add public evals #74

Add public evals #74

Conversation

ankrgyl commented Jul 6, 2024 • edited Loading

github-actions bot commented Jul 6, 2024 • edited Loading

Braintrust eval report

ankrgyl commented Jul 7, 2024

github-actions bot commented Jul 16, 2024 • edited Loading

Braintrust eval report

ankrgyl commented Jul 6, 2024 •

edited

Loading

github-actions bot commented Jul 6, 2024 •

edited

Loading

github-actions bot commented Jul 16, 2024 •

edited

Loading