Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling to production -> OSError: [Errno 24] Too many open files socket.accept() out of system resource #714

Closed
lukasugar opened this issue Jul 24, 2024 · 1 comment

Comments

@lukasugar
Copy link

Problem

When my LangServe app gets ~1000 concurrent requests, it breaks with error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

Mitigation/quickfix

I've checked the soft ulimit of the VM, it was only 1024, while the hard limit is 524288. I've increased the soft limit to be 100000, which should mitigate the issue for now.

Better way of doing it?

I'm curious if there's a better way of handling this issue. Even with the increased limit of allowed open files, is there something I can do in my app to make it better/more resilient?

What my code looks like

I define chains like this, straightforward:

review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            REVIEW_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o")

and pass them to the router:

# python
from fastapi import APIRouter
from langserve import add_routes

router = APIRouter()

add_routes(router, review_text_chain, path="/api/v1/review_text")

I'm calling this service from a separatenestjs application, like this:

// typescript
private async callAiReview(document: string): Promise<any> {
    const analysisResponse = await fetch('www.path_to_my_endpoint/api/v1/review_text/invoke', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: document,
    });

    return await analysisResponse.json();
  }

There's a bunch of documents, and for each I'm calling callAiReview method.

Are there things in the app that can be improved? Maybe async?

Should I use async in LangServe? How?

I'm aware that I could use batch instead of invoke, but other than that, are there improvements to be made?

How to make LangServe work with production load?

@eyurtsev
Copy link
Collaborator

eyurtsev commented Aug 2, 2024

Duplicate of #717.

Issue likely stemmed either using a misconfigured langsmith tracing client or being rate limited by the langsmith client.

@eyurtsev eyurtsev closed this as completed Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants