Factuality Evaluator failing #28

ishaan-jaff · 2023-11-09T23:00:03Z

Tried this code snippet:

from autoevals.llm import *
import openai

openai.api_key = "sk-"
 
# Create a new LLM-based evaluator
evaluator = Factuality()
 
# Evaluate an example LLM completion
input = "Which country has the highest population?"
output = "People's Republic of China"
expected = "China"
 
result = evaluator(output, expected, input=input)
print(result)
 
# The evaluator returns a score from [0,1] and includes the raw outputs from the evaluator
print(f"Factuality score: {result.score}")
print(f"Factuality metadata: {result.metadata['rationale']}")

I see this error:
Score(name='Factuality', score=0, metadata={}, error=KeyError('usage'))
Factuality score: 0

Traceback (most recent call last):
  File "/Users/ishaanjaffer/Github/litellm/litellm/tests/test_autoeval.py", line 19, in <module>
    print(f"Factuality metadata: {result.metadata['rationale']}")
KeyError: 'rationale'

Any suggestions on how i can debug this ?

ankrgyl · 2023-11-10T00:48:33Z

Hmm this error seems to imply that the result from OpenAI did not include the "usage" key. We have some logic in autoevals that tries to extract usage metrics and log them, that's probably failing for some reason.

We're in the middle of reworking this in #27 and I suspect, especially if you're not using braintrust, that this will resolve the issue.

In the meantime, could you share the version of autoevals, openai, and python you're using?

ishaan-jaff · 2023-11-10T01:26:44Z

autoevals 0.0.30
openai 0.28.1

python 3.10

ishaan-jaff · 2023-11-10T01:33:33Z

it would also be useful if you print the stacktrace on errors

ankrgyl · 2023-11-10T01:58:26Z

Interesting, do you mind patching #27 or re-testing after we land that change? I was not able to repro the error, but I suspect the response you're getting from OpenAI (perhaps related to your key?) is missing the usage field.

ishaan-jaff · 2023-11-10T02:01:35Z

yes, i can re test once you've landed #27 , should i leave this issue open till then ?

ankrgyl · 2023-11-10T02:02:04Z

just published 0.0.31. Please leave it opne!

ishaan-jaff · 2023-11-14T04:43:16Z

works now

ishaan-jaff closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factuality Evaluator failing #28

Factuality Evaluator failing #28

ishaan-jaff commented Nov 9, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 14, 2023

Factuality Evaluator failing #28

Factuality Evaluator failing #28

Comments

ishaan-jaff commented Nov 9, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 10, 2023

ankrgyl commented Nov 10, 2023

ishaan-jaff commented Nov 14, 2023