Overall Report for Eval Set As a Whole #939

chinggg · 2023-05-08T11:18:51Z

Describe the feature or improvement you're requesting

Eval set is useful for running a group of evals at the same time. Currently eval set is just a collection of independent evals and oaievalset command is simply a wrapper that runs multiple oaieval commands concurrently.

I think it should be useful to analyze the data from eval set as a whole, especially if all evals in the eval sets have the same metric. Under this circumstance, we aim to do same experiment by asking similar questions. We split them into different evals because they are classified by different data. For example, if we want to evaluate LLM's performance on detecting spam in different languages. We want to get accuracy for different languages, as well as the overall detection accuracy for all spams. It would be great if eval set can generate this kind of overall report automatically.

Additional context

This feature request is an Idea for Eval, the framework itself, but not for adding new evals.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall Report for Eval Set As a Whole #939

Overall Report for Eval Set As a Whole #939

chinggg commented May 8, 2023 •

edited

Loading

Overall Report for Eval Set As a Whole #939

Overall Report for Eval Set As a Whole #939

Comments

chinggg commented May 8, 2023 • edited Loading

Describe the feature or improvement you're requesting

Additional context

chinggg commented May 8, 2023 •

edited

Loading