Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall Report for Eval Set As a Whole #939

Open
chinggg opened this issue May 8, 2023 · 0 comments
Open

Overall Report for Eval Set As a Whole #939

chinggg opened this issue May 8, 2023 · 0 comments

Comments

@chinggg
Copy link

chinggg commented May 8, 2023

Describe the feature or improvement you're requesting

Eval set is useful for running a group of evals at the same time. Currently eval set is just a collection of independent evals and oaievalset command is simply a wrapper that runs multiple oaieval commands concurrently.

I think it should be useful to analyze the data from eval set as a whole, especially if all evals in the eval sets have the same metric. Under this circumstance, we aim to do same experiment by asking similar questions. We split them into different evals because they are classified by different data. For example, if we want to evaluate LLM's performance on detecting spam in different languages. We want to get accuracy for different languages, as well as the overall detection accuracy for all spams. It would be great if eval set can generate this kind of overall report automatically.

Additional context

This feature request is an Idea for Eval, the framework itself, but not for adding new evals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant