Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep up with OpenAI evals #4

Closed
tju01 opened this issue May 20, 2023 · 1 comment
Closed

Keep up with OpenAI evals #4

tju01 opened this issue May 20, 2023 · 1 comment
Labels
enhancement New feature or request existing-benchmark For changing an existing benchmark on the leaderboard

Comments

@tju01
Copy link
Member

tju01 commented May 20, 2023

Right now, I'm using a fork of OpenAI evals over here: https://github.com/tju01/evals. This is for multiple reasons:

  1. OpenAI evals doesn't log model outputs and other information to disk, but just outputs the final results in the end. However, I do believe that looking at the actual outputs can be very useful. I therefore added code to also record model outputs, prompts etc. to disk for later use on the website.
  2. OpenAI evals is not really that much designed to evaluate other models like OpenAssistant. They mostly made it open source so that people can contribute evals for evaluating OpenAI GPT models. So some small change was needed for evaluating OpenAssistant.
  3. Other modifications that I would like to do like limiting the number of samples per task that are evaluated to 20 need modification of the OpenAI evals codebase and can't really be done from the outside (easily).

However, OpenAI evals recently moves very quickly and they recently seem to have added more people to merge new tasks.

This codebase and https://github.com/tju01/evals should continuously keep up with OpenAI evals. Right now, there seems to be a merge conflict despite the fact that I tried to keep the changes minimal. Also one needs to figure out what new evals were added and whether they would be useful for OpenAssistant.

@tju01 tju01 added existing-benchmark For changing an existing benchmark on the leaderboard enhancement New feature or request labels Jul 7, 2023
@tju01
Copy link
Member Author

tju01 commented Jul 22, 2023

I'm closing this issue since I removed the OpenAI Evals benchmark. See https://github.com/tju01/ilm-eval/pull/51#issuecomment-1646594067 for the reason why I removed it but basically it's just a bad benchmark compared to the other ones that I have now.

@tju01 tju01 closed this as completed Jul 22, 2023
@tju01 tju01 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request existing-benchmark For changing an existing benchmark on the leaderboard
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant