Understanding Test Internal Functionality #85

MagicPupu · 2024-05-24T07:17:31Z

Hello IroncladDev Team,

I would like to use llmarena to test some LLMs and generate performance reports.

However, I am unsure about how the results are calculated. Is the percentage of success for each test dataset quantitative or qualitative? Specifically, is it the percentage of correct answers within the dataset, or the percentage of the precision of its answers?

Thanks in advance,
Antoine

IroncladDev · 2024-05-24T11:35:43Z

Take a look at the Contributor Page, data has to be entered manually and backed up by a source

MagicPupu · 2024-05-24T13:43:12Z

Thank you for your response.

I wanted to know how the result data is generated and whether you perform any manipulation of this data, or if it comes directly from the benchmarks. Now I understand that I need to look directly into the sources of the benchmarks.

Thank you again, and I will continue my research.

Best regards,
Antoine

MagicPupu changed the title ~~Tests internal functionment~~ Understanding Test Internal Functionality May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding Test Internal Functionality #85

Understanding Test Internal Functionality #85

MagicPupu commented May 24, 2024

IroncladDev commented May 24, 2024 •

edited

Loading

MagicPupu commented May 24, 2024

Understanding Test Internal Functionality #85

Understanding Test Internal Functionality #85

Comments

MagicPupu commented May 24, 2024

IroncladDev commented May 24, 2024 • edited Loading

MagicPupu commented May 24, 2024

IroncladDev commented May 24, 2024 •

edited

Loading