Skip to content

Commit

Permalink
Add public evals (#74)
Browse files Browse the repository at this point in the history
Add support for an eval suite based on COQA, eval the evaluators, and
switch to gpt-4o as the default.
  • Loading branch information
ankrgyl authored Jul 16, 2024
1 parent 6b19fb2 commit 0b7fcb8
Show file tree
Hide file tree
Showing 26 changed files with 19,840 additions and 6,352 deletions.
48 changes: 48 additions & 0 deletions .github/workflows/eval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Run pnpm evals

on:
push:
# Uncomment to run only when files in the 'evals' directory change
# - paths:
# - "evals/**"

permissions:
pull-requests: write
contents: read

jobs:
eval:
name: Run evals
runs-on: ubuntu-latest

steps:
- name: Checkout
id: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup Node.js
id: setup-node
uses: actions/setup-node@v4
with:
node-version: 20

- uses: pnpm/action-setup@v3
with:
version: 8

- name: Install Dependencies
id: install
run: pnpm install

- name: Build packages
id: build
run: pnpm build

- name: Run Evals
uses: braintrustdata/eval-action@v1
with:
api_key: ${{ secrets.BRAINTRUST_API_KEY }}
runtime: node
root: evals
10 changes: 9 additions & 1 deletion .github/workflows/js.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,19 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Cache node_modules
uses: actions/cache@v4
with:
path: |
node_modules
!node_modules/.cache/turbo
key: ${{ matrix.runner }}-${{ matrix.node_version }}-node-${{ env.nodeModulesCacheHash }}
restore-keys: |
${{ matrix.runner }}-${{ matrix.node_version }}-node-
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: "npm"
- uses: pnpm/action-setup@v2
with:
version: 8
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ repos:
- id: codespell
exclude: >
(?x)^(
.*\.(json|prisma)
.*\.(json|prisma|yaml)
)$
args: [-L rouge]

Expand Down
23 changes: 23 additions & 0 deletions evals/.eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
const path = require("path");

module.exports = {
extends: ["plugin:@typescript-eslint/recommended", "prettier"],
plugins: ["@typescript-eslint"],
rules: {
"@typescript-eslint/no-unused-vars": [
"error",
{
vars: "all",
args: "none",
ignoreRestSiblings: false,
argsIgnorePattern: "^_",
varsIgnorePattern: "^_",
},
],
"prefer-const": "error",
"@typescript-eslint/no-explicit-any": "off",
"@typescript-eslint/ban-types": "off",
"@typescript-eslint/ban-ts-comment": "off",
"@typescript-eslint/no-var-requires": "off",
},
};
3 changes: 3 additions & 0 deletions evals/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"singleQuote": false
}
3,258 changes: 3,258 additions & 0 deletions evals/datasets/coqa-closed-qa.json

Large diffs are not rendered by default.

Loading

0 comments on commit 0b7fcb8

Please sign in to comment.