You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use llmarena to test some LLMs and generate performance reports.
However, I am unsure about how the results are calculated. Is the percentage of success for each test dataset quantitative or qualitative? Specifically, is it the percentage of correct answers within the dataset, or the percentage of the precision of its answers?
Thanks in advance,
Antoine
The text was updated successfully, but these errors were encountered:
MagicPupu
changed the title
Tests internal functionment
Understanding Test Internal Functionality
May 24, 2024
I wanted to know how the result data is generated and whether you perform any manipulation of this data, or if it comes directly from the benchmarks. Now I understand that I need to look directly into the sources of the benchmarks.
Hello IroncladDev Team,
I would like to use llmarena to test some LLMs and generate performance reports.
However, I am unsure about how the results are calculated. Is the percentage of success for each test dataset quantitative or qualitative? Specifically, is it the percentage of correct answers within the dataset, or the percentage of the precision of its answers?
Thanks in advance,
Antoine
The text was updated successfully, but these errors were encountered: