You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This could be added to the api models by interpreting response headers or maybe an option given to Reranker which limits the amount of requests per min.
For instance, Jina when not on premium is 60 rpm. Cohere is 10 rpm on trial key and 1000 rpm on production key
The text was updated successfully, but these errors were encountered:
I'd see an optional "max_requests_per_minute" argument to the loading an API reranker, along with a retries_on_failure: int and max_time_between_retries parameters which would specify the max number of retries? Either (or both) being set would result in:
time.sleep(60 - time_spent_on_the_most_recent_max_requests_per_minute + 5 to have a buffer) on hitting the max RPM
Automatically retry retries_on_failure times, starting with a 1s backoff and increasing to max_time_between_retries
These would be optional and most likely default to:
max_requests_per_minute: whatever we can find for a production API key for a given provider
retries_on_failure: 3
max_time_between_retries: 15
Is this something you'd be interesting in contributing a PR for? Otherwise I'll add it to the to-do as a low priority item!
Sorry, I've moved on from this particular need but I could imagine forking/working on it in the future if it becomes a blocker. Better off categorizing as a low-pri item on your end 👍
This could be added to the api models by interpreting response headers or maybe an option given to Reranker which limits the amount of requests per min.
For instance, Jina when not on premium is 60 rpm. Cohere is 10 rpm on trial key and 1000 rpm on production key
The text was updated successfully, but these errors were encountered: