Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Test Internal Functionality #85

Open
MagicPupu opened this issue May 24, 2024 · 2 comments
Open

Understanding Test Internal Functionality #85

MagicPupu opened this issue May 24, 2024 · 2 comments

Comments

@MagicPupu
Copy link

Hello IroncladDev Team,

I would like to use llmarena to test some LLMs and generate performance reports.

However, I am unsure about how the results are calculated. Is the percentage of success for each test dataset quantitative or qualitative? Specifically, is it the percentage of correct answers within the dataset, or the percentage of the precision of its answers?

Thanks in advance,
Antoine

@MagicPupu MagicPupu changed the title Tests internal functionment Understanding Test Internal Functionality May 24, 2024
@IroncladDev
Copy link
Owner

IroncladDev commented May 24, 2024

Take a look at the Contributor Page, data has to be entered manually and backed up by a source

@MagicPupu
Copy link
Author

Thank you for your response.

I wanted to know how the result data is generated and whether you perform any manipulation of this data, or if it comes directly from the benchmarks. Now I understand that I need to look directly into the sources of the benchmarks.

Thank you again, and I will continue my research.

Best regards,
Antoine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants