-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in LangServe #717
Comments
I don't see anything obvious. What does |
It's pretty basic as well:
|
@eyurtsev any ideas on how to debug this? Is ChatOpenAI closing its connections after calls? |
Ill read over the chat open ai implementation on Monday. You could try deploying chat open ai as the sole runnable and verifying that you can recreate the problem if so that would help isolate the issue so we can rule out user code. |
Would you mind including output of python -m langchain_core.sys_info |
I'll try deploying chat open ai as the sole runnable and recreating the problem first thing tomorrow.
Additionally, these are the dependecies in the poetry file:
And this is the |
@eyurtsev I've ran 4k requests to ChatOpenAI and I can see the memory leak.
Here's the RAM usage. The app uses ~200MB when started. The usage jumps to ~400MB, and stays there even after the requests are completed. The red line is the point in time when all the requests are completed. |
Here's the chat open AI implementation. It's creating The client has default limits of:
So there should be a connection pool there.
|
@lukasugar while we're debugging, you can roll out a quick workaround using: https://www.uvicorn.org/settings/#resource-limits |
Possibly, see if you can confirm the env configuration you have. I don't see anything suspicious in the chat model code right now as it looks like it uses a connection pool by default and is only initialized once.
I wonder if if we're seeing something from instantiation of pydantic models. LangChain relies on pydantic v1 namespace and we instantiate models both to create the prompts and also when we output the messages from the chat model. The other possible source of issues is langserve itself as it does some stuff w/ request objects and it creates pydantic models |
To answer your questions @eyurtsev :
In python code, I'm using
|
What environment information do you need? The Dockerfile is the same as in the LangServe documentation:
Environment variables that I'm specifying:
|
@lukasugar cool that's complete. I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., But that's not the case, and I don't think that's where the issue is from. Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?
|
@eyurtsev I've ran the experiment with returning only the prompt. I'm using the exact code as in your example.
|
OK great this rules out anything specific to chat models. There's one more potential source which is the langsmith-sdk If it's also not that, I'll need a bit of time to dig in since it's either pydantic or glue code in langserve. If it's pydantic, you'll need to force restart the workers as a work-around and hope that it gets resolved when we upgrade to pydantic 2 (tentatively next month). |
I'll disable tracing and check if it changes anything |
I'm a bit confused... I disabled It looks like:
So, disabling I don't see a great solution:
And there still are some memory leaks... |
@eyurtsev do you think the |
@lukasugar thanks!
I don't know since we still need to isolate exactly where it is. It could be that there's some easy to fix bug in core or langsmith or langserve that's not related to pydantic.
Yes of course! |
@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits --limit-max-requests |
@lukasugar I haven't been able to reproduce any issues as long as langsmith tracer is either disabled or else configured properly (i.e. not rate limited). Could you configure a logger and check if you're getting warnings from the langsmith client about getting rate limited? if you hammer at the server hard enough while being rate limited by langsmith, you could definitely see memory consumption increase as the tracer will hold on to the data in memory temporarily and do a few more retries to avoid losing tracing data. import logging
import os
import psutil
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langserve import add_routes
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
app = FastAPI()
def get_memory_usage():
process = psutil.Process(os.getpid())
mem_info = process.memory_info()
return mem_info.rss / 1024 / 1024 # Convert bytes to MB
prompt = ChatPromptTemplate.from_messages(
[
("system", "You're an assistant by the name of Bob." * 100),
("human", "{input}"),
]
)
@app.get("/memory-usage")
def memory_usage():
memory = get_memory_usage()
return {"memory_usage": memory}
add_routes(app, prompt, path="/prompt")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=7999) Here's a curl to issue a request: random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
curl -X 'POST' \
'http://localhost:7999/prompt/invoke' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d "{
\"input\": {
\"text\": \"$random_string\"
},
\"config\": {},
\"kwargs\": {}
}"
And you can monitor the memory usage this way: watch -n 1 curl -s localhost:7999/memory-usage My environment: System Information
Package Information
|
I can confirm that I was getting rate limited by
I'll check the memory consumption the way you suggested, probably tomorrow. |
@lukasugar OK for me to close the issue for now? |
@eyurtsev sorry, I'm overwhelmed with work the last few days... When I test the memory consumption the way you suggested, I'll re-open the ticket if the issue persists. |
I've tried setting Here's the code:
Nothing happens after the server gets 10 (or even 50) requests. I've tried simplified code, where it's expected that server will terminate after the limit is reached, it still doesn't work:
I can make as many requests as I want, and the service is still running: Any idea why that's happening? |
I can't verify that workers are restarted after |
LangServe takes dependency on
I've tried updating to the latest |
Can you try collecting garbarge by calling the garbage collector explicitly (gc.collect() ) after handling requests to free up memory? |
Hello everyone, |
@michael81045 can you provide more context so we can see how our projects overlap, and precisely identify the issue? What's your system info, what environment are you using? |
Same issue on our side :O |
@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏 |
@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏 |
Hi apologies I was on vacation and then working on the 0.3 release for langchain. I'll check what's constraining uvicorn (probably sse-starlette) and unpin. @michael81045 , @pedrojrv , @lukasugar I still haven't seen a confirmation of what's actually causing the memory leak. Based on what I diagnosed above it was happening because of user misconfiguration of langsmith. (i.e., enabling the tracer, not sampling of traces etc). For folks seeing problems, can you confirm that it's not from a misconfiguration of langsmith? |
langserve does not pin uvicorn directly, and based on sub-deps I don't see any uvicorn version pinning (e.g., from sse-starlette). sse-starlette==1.8.2 I suggest using pipdeptree to determine what's pinning the |
Is this thread still active? I'm seeing the memory leak as well on the latest version of LangServe. |
@williamcodes can you please share your config/installed packages/repro steps, it could help in finding the cause? 🙏 |
I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?
I'm seeing this error:
seems like some clients are not closing connections. I'm using only
ChatOpenAI
in this app.With every new request, RAM increases and doesn't go down:
The code is straightforward, I'm following examples from the docs.
Chain definitio in
public_review.py
:Chain is imported in
routers.py
:Any ideas what could be causing the leak? This is literally the entire code.
The text was updated successfully, but these errors were encountered: