Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain-google-vertexai: retryable errors are not retried #7493

Open
5 tasks done
siviter-t opened this issue Jan 9, 2025 · 4 comments
Open
5 tasks done

langchain-google-vertexai: retryable errors are not retried #7493

siviter-t opened this issue Jan 9, 2025 · 4 comments
Assignees
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@siviter-t
Copy link

siviter-t commented Jan 9, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatVertexAI } from "@langchain/google-vertexai";
import { faker } from "@faker-js/faker";

const llm = new ChatVertexAI({
  model: "gemini-1.5-flash-002",
  temperature: 0,
  maxRetries: 2,
});

// Attempt to break the suggested 32 k tokens / min context limit
// A 100 tokens is ~ 60-80 English words for Gemini
// https://ai.google.dev/gemini-api/docs/tokens?lang=node
// https://github.com/langchain-ai/langchain/issues/22241
const manyTokens = faker.lorem.words({ min: 60, max: 80 }).repeat(1280); // More than 32 k, less than 128 k

const aiMsg = await llm.invoke([
  [
    "system",
    "You are a helpful assistant that summarises the user's content",
  ],
  ["human", manyTokens],
]);

Error Message and Stack Trace (if applicable)

{
  "code": 429,
  "errors": [
    {
      "message": "Resource exhausted. Please try again later. Please refer to https: //cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.",
      "domain": "global",
      "reason": "rateLimitExceeded"
    }
  ]
}

Description

We've encountered resource exhaustion errors with the Vertex AI integration in production lately, specifically with Gemini 1.5 Flash. This is essentially langchain-ai/langchain/issues/22241 where others have reported limits on the tokens per minute.

I would have thought these would be automatically retried with langchain's p-retry implementation. Debugging into the source I see that the defaultFailedAttemptHandler is overridden by a failedAttemptHandler in langchain-google-common.
Though this tests for errors that should not be retried it ultimately throws all errors.

System Info

Mac
node v22.11.0
pnpm 9.13.2
[email protected]

Copy link

dosubot bot commented Jan 9, 2025

Hey @siviter-t! I'm here to help you with any bugs, questions, or contributions you have. Let's get this issue sorted out together.

The issue you're encountering with retryable errors not being retried in the langchain-google-vertexai integration is related to how the failedAttemptHandler in langchain-google-common is implemented. This handler checks for specific error conditions and throws errors that should not be retried. However, it appears that it ultimately throws all errors, which might be why your rate limit exceeded error (HTTP 429) is not being retried.

The failedAttemptHandler specifically checks for certain HTTP status codes and error conditions to determine if an error should not be retried. These include status codes like 400, 401, 402, 403, 404, 405, 406, 407, and 409, as well as specific error messages and codes such as "Cancel", "AbortError", and "ECONNABORTED" [1][2].

To address this, you might consider customizing the retry logic to handle the 429 error specifically. This could involve modifying the failedAttemptHandler to allow retries for the 429 status code or implementing a custom retry strategy that handles this case.

If you have control over the code, you could adjust the failedAttemptHandler to not throw the 429 error, allowing the p-retry mechanism to retry the request. Alternatively, you could implement a custom retry logic that specifically handles the 429 error by catching it and retrying the request manually.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 9, 2025
@afirstenberg
Copy link
Contributor

Thanks for flagging this!

@jacoblee93 - Please assign this to me if you don't get to it yourself or if @siviter-t doesn't get to it first. {:
I also want to look if the retrys are with an exponential backoff or not (they weren't last time I looked at this).

@jacoblee93
Copy link
Collaborator

Thank you @afirstenberg!

@siviter-t
Copy link
Author

siviter-t commented Jan 10, 2025

In my local tests in our repo, overriding this handler and simply dropping the final throw does the trick for rate limits and I observed it demonstrates the default exponential back-off config of p-retry - though that looks finickity to test in jest. Unsure whether this is a sufficient fix and whether other classes of error ought to be ignored. Happy to file a PR for this when I next find a chance.

const vertexWorkaroundRetryHandler = (error: any) => {
  const status = error?.response?.status ?? 0;

  if (status === 0) {
    // What is this?
    console.error("failedAttemptHandler", error);
  }

  // What errors shouldn't be retried?
  if (STATUS_NO_RETRY.includes(+status)) {
    throw error;
  }

  log.trace({ src: "vertexWorkaroundRetryHandler", error }, "Received retryable error");
};

const model = new ChatVertexAI({
  model: "gemini-1.5-flash-002",
  temperature: 0,
  onFailedAttempt: vertexWorkaroundRetryHandler,
  maxRetries: 6,
 });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants