Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in LangServe #717

Open
lukasugar opened this issue Jul 26, 2024 · 41 comments
Open

Memory leak in LangServe #717

lukasugar opened this issue Jul 26, 2024 · 41 comments
Assignees

Comments

@lukasugar
Copy link

lukasugar commented Jul 26, 2024

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:

image

The code is straightforward, I'm following examples from the docs.
Chain definitio in public_review.py:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})


public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Chain is imported in routers.py:

# Chain added to router and router is then added to the app
from fastapi import APIRouter
from langserve import add_routes

from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain
from app.enrichment.public_review import public_review_chain, public_review_text_chain
from app.enrichment.types import (
    InputFragment,
    InputFragmentList
  )

router = APIRouter()

add_routes(
    router,
    public_review_chain.with_types(input_type=InputFragmentList, output_type=IssueList),
    path="/api/v1/public_review",
)
add_routes(router, public_review_text_chain, path="/api/v1/public_review/text")

Any ideas what could be causing the leak? This is literally the entire code.

@eyurtsev
Copy link
Collaborator

I don't see anything obvious. What does from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain look like?

@eyurtsev eyurtsev self-assigned this Jul 26, 2024
@lukasugar
Copy link
Author

It's pretty basic as well:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

from app.enrichment.types import IssueList
from app.prompts.issue_aggregator_prompt import ISSUE_AGGREGATOR_SYSTEM_PROMPT

aggregator_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            ISSUE_AGGREGATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})


def _serialize_input(x: IssueList) -> str:
    """Helper function to serialize the input"""

    if isinstance(x, dict):
        _ifl = IssueList(issues=x["issues"])
        return _ifl.json()
    return x.json()


aggregator_review_chain = (
    {"text": RunnableLambda(_serialize_input)}
    | aggregator_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

@lukasugar
Copy link
Author

@eyurtsev any ideas on how to debug this?

Is ChatOpenAI closing its connections after calls?

@eyurtsev
Copy link
Collaborator

Ill read over the chat open ai implementation on Monday.

You could try deploying chat open ai as the sole runnable and verifying that you can recreate the problem if so that would help isolate the issue so we can rule out user code.

@eyurtsev
Copy link
Collaborator

Would you mind including output of

python -m langchain_core.sys_info

@lukasugar
Copy link
Author

I'll try deploying chat open ai as the sole runnable and recreating the problem first thing tomorrow.
In the meantime, here's the output of the stuff command you asked:

# python -m langchain_core.sys_info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Mon Oct 9 16:21:24 UTC 2023
> Python Version:  3.11.9 (main, Jul 23 2024, 07:22:56) [GCC 12.2.0]

Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langsmith: 0.1.83
> langchain_cli: 0.0.25
> langchain_openai: 0.1.14
> langchain_text_splitters: 0.2.2
> langgraph: 0.1.5
> langserve: 0.2.2
# 

Additionally, these are the dependecies in the poetry file:

[tool.poetry.dependencies]
python = ">3.11, <3.12"
uvicorn = "^0.23.2"
langserve = "^0.2.2"
python-decouple = "^3.8"
mypy = "^1.10.0"
poetry-dotenv-plugin = "^0.2.0"
python-dotenv = "^1.0.1"
langchain-openai = "^0.1.14"
langchain-core = "^0.2.11"
langgraph = "^0.1.5"
langchain = "^0.2.6"
pydantic = "<2"
aiosqlite = "^0.20.0"


[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"

And this is the poetry.lock file: poetry.lock

@lukasugar
Copy link
Author

@eyurtsev I've ran 4k requests to ChatOpenAI and I can see the memory leak.
Code:

# Chain definition
simple_chat_openai = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o-mini", temperature=0.03, model_kwargs={"seed": 13})

# server.py
add_routes(app, simple_chat_openai, path="/chat_openai")

Here's the RAM usage. The app uses ~200MB when started. The usage jumps to ~400MB, and stays there even after the requests are completed. The red line is the point in time when all the requests are completed.
image

@lukasugar
Copy link
Author

I've continued running the endpoint and the memory continued leaking until the service broke:
image

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jul 29, 2024

Here's the chat open AI implementation. It's creating httpx.AsyncClient.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L451-L451

The client has default limits of:

DEFAULT_LIMITS = Limits(max_connections=100, max_keepalive_connections=20)

So there should be a connection pool there.


@lukasugar

  1. which endpoint are you hitting on the server? (ainvoke? astream?)
  2. Do you have any additional model configuration? e.g., proxy set up? (I'm wondering if there's any configuration coming from env variables)

@eyurtsev
Copy link
Collaborator

@lukasugar while we're debugging, you can roll out a quick workaround using: https://www.uvicorn.org/settings/#resource-limits

@lukasugar
Copy link
Author

Side note, I've tried using Anthropic chain, and got the same issue:

# chain definition
simple_chat_anthropic = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatAnthropic(model="claude-3-haiku-20240307")

# server.py
from app.enrichment.public_review import simple_chat_anthropic

add_routes(app, simple_chat_anthropic, path="/chat_anthropic")

The memory is constantly growing (ignore the orange line)
image

So this could be:

  1. An issue with some base chat langchain class?
  2. An issue with the way prompt templates are created in the code?

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jul 29, 2024

An issue with some base chat langchain class?

Possibly, see if you can confirm the env configuration you have. I don't see anything suspicious in the chat model code right now as it looks like it uses a connection pool by default and is only initialized once.


An issue with the way prompt templates are created in the code?

I wonder if if we're seeing something from instantiation of pydantic models. LangChain relies on pydantic v1 namespace and we instantiate models both to create the prompts and also when we output the messages from the chat model.


The other possible source of issues is langserve itself as it does some stuff w/ request objects and it creates pydantic models

@lukasugar
Copy link
Author

To answer your questions @eyurtsev :

  1. I'm invoking the server through:
image I'm calling the LangServe app from a javascript app and from python notebooks. In js, I'm using `fetch`:
      const aiResponse = await fetch("www.my/endpoint/chat_openai/invoke", {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(requestData),
      });

In python code, I'm using requests:

def post_invoke_langserve(path: str, payload: str):
    headers = {"Content-Type": "application/json", "Accept": "application/json"}
    _url = os.path.join(base_url, path)
    response = requests.post(_url, headers=headers, data=payload)

    return response
  1. We don't have any additional model configs. Some models have specified seed and temperature, that's all:
ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})

@lukasugar
Copy link
Author

What environment information do you need?

The Dockerfile is the same as in the LangServe documentation:

FROM python:3.11-slim

RUN pip install poetry==1.6.1

RUN poetry config virtualenvs.create false

WORKDIR /code

COPY ./pyproject.toml ./README.md ./poetry.lock* ./

COPY ./package[s] ./packages

RUN poetry install  --no-interaction --no-ansi --no-root

COPY ./app ./app

RUN poetry install --no-interaction --no-ansi

ARG PORT

EXPOSE ${PORT:-8080}

CMD exec uvicorn app.server:app --host 0.0.0.0 --port ${PORT:-8080}

Environment variables that I'm specifying:

LANGCHAIN_API_KEY=some_value
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_PROJECT=langserve-staging
LANGCHAIN_TRACING_V2=true
OPENAI_API_KEY=some_value
ANTHROPIC_API_KEY=some_value

@eyurtsev
Copy link
Collaborator

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.


Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?

# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")

@lukasugar
Copy link
Author

I've tried running ChatOpenAI as the only runnable in the chain - and the memory is still leaking.

Here's the code:

from langchain_openai import ChatOpenAI
from langserve import add_routes

add_routes(app, ChatOpenAI(model="gpt-4o-mini"), path="/chat_openai_plain")

The chat now contains only ChatOpenAI and the memory is leaking (orange line). After few thousand requests, memory went from 200MB -> 500MB.

image

@lukasugar
Copy link
Author

@eyurtsev I've ran the experiment with returning only the prompt.
The memory is leaking:
image

I'm using the exact code as in your example.

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.

Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?

# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")

@eyurtsev
Copy link
Collaborator

OK great this rules out anything specific to chat models.

There's one more potential source which is the langsmith-sdk LANGCHAIN_TRACING_V2=false -- this also makes network connections so it could explain the oserror.

If it's also not that, I'll need a bit of time to dig in since it's either pydantic or glue code in langserve. If it's pydantic, you'll need to force restart the workers as a work-around and hope that it gets resolved when we upgrade to pydantic 2 (tentatively next month).

@lukasugar
Copy link
Author

I'll disable tracing and check if it changes anything

@lukasugar
Copy link
Author

I'm a bit confused... I disabled langsmith tracing (removed the environment variables).
It seems that there still is a memory leak, but it's less deterministic, it doesn't happen with all calls.

Memory:
image

Requests:
image

It looks like:

  • some requests are causing memory to increase. Later calls are not.
  • memory never drops

So, disabling langsmith tracing helps, but it's not the only reason for memory leaks.

I don't see a great solution:

  • swapping langsmith with some other logging service would help, but I like LangSmith
  • disabling logging - no way, we need the logs

And there still are some memory leaks...

@lukasugar
Copy link
Author

@eyurtsev do you think the pydantic 2 update will fix the memory leaks? Could you please find someone from the LangSmith team look into the issue as well?
Thanks!

@eyurtsev
Copy link
Collaborator

@lukasugar thanks!

do you think the pydantic 2 update will fix the memory leaks?

I don't know since we still need to isolate exactly where it is. It could be that there's some easy to fix bug in core or langsmith or langserve that's not related to pydantic.

Could you please find someone from the LangSmith team look into the issue as well?

Yes of course!

@eyurtsev
Copy link
Collaborator

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jul 30, 2024

@lukasugar I haven't been able to reproduce any issues as long as langsmith tracer is either disabled or else configured properly (i.e. not rate limited).

Could you configure a logger and check if you're getting warnings from the langsmith client about getting rate limited?

if you hammer at the server hard enough while being rate limited by langsmith, you could definitely see memory consumption increase as the tracer will hold on to the data in memory temporarily and do a few more retries to avoid losing tracing data.

import logging
import os

import psutil
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate

from langserve import add_routes

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

logger = logging.getLogger(__name__)

app = FastAPI()


def get_memory_usage():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss / 1024 / 1024  # Convert bytes to MB


prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You're an assistant by the name of Bob." * 100),
        ("human", "{input}"),
    ]
)


@app.get("/memory-usage")
def memory_usage():
    memory = get_memory_usage()
    return {"memory_usage": memory}


add_routes(app, prompt, path="/prompt")


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=7999)

Here's a curl to issue a request:

random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
curl -X 'POST' \
  'http://localhost:7999/prompt/invoke' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{
  \"input\": {
    \"text\": \"$random_string\"
  },
  \"config\": {},
  \"kwargs\": {}
}"

And you can monitor the memory usage this way:

watch -n 1 curl -s localhost:7999/memory-usage

My environment:

System Information

OS: Linux
OS Version: #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2
Python Version: 3.11.4 (main, Sep 25 2023, 10:06:23) [GCC 11.4.0]

Package Information

langchain_core: 0.2.11
langchain: 0.2.6
langchain_community: 0.0.36
langsmith: 0.1.83
langchain_anthropic: 0.1.11
langchain_cli: 0.0.25
langchain_openai: 0.1.14
langchain_text_splitters: 0.2.2
langgraph: 0.1.5
langserve: 0.2.2

@lukasugar
Copy link
Author

lukasugar commented Jul 30, 2024

I can confirm that I was getting rate limited by langsmith:

Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch.
 HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Usage limit monthly_traces of 50000 exceeded"}\')')

I'll check the memory consumption the way you suggested, probably tomorrow.

@eyurtsev
Copy link
Collaborator

eyurtsev commented Aug 2, 2024

@lukasugar OK for me to close the issue for now?

@lukasugar
Copy link
Author

@eyurtsev sorry, I'm overwhelmed with work the last few days... When I test the memory consumption the way you suggested, I'll re-open the ticket if the issue persists.

@lukasugar
Copy link
Author

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

I've tried setting limit_max_requests to make the server restart after the max number of requests has been reached.

Here's the code:

if __name__ == "__main__":
    import uvicorn

    while True:
        uvicorn.run(
            app, host="0.0.0.0", port=8000, limit_max_requests=10
        )

        print(f"Restarting server")

Nothing happens after the server gets 10 (or even 50) requests.

I've tried simplified code, where it's expected that server will terminate after the limit is reached, it still doesn't work:

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(
        app, host="0.0.0.0", port=8000, limit_max_requests=10
    )

I can make as many requests as I want, and the service is still running:
image

Any idea why that's happening?

@lukasugar
Copy link
Author

I can't verify that workers are restarted after limit_max_requests. Do you know how I could verify that?

@lukasugar
Copy link
Author

LangServe takes dependency on uvicorn (>=0.23.2,<0.24.0). That's a year old version... I've tried updating to the latest, uvicorn 0.30, but I encountered an issue:

poetry add uvicorn@^0.30

...

Because no versions of langchain-cli match >0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28
 and langchain-cli (0.0.15) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.16) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.17) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.18) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.19) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.20) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.21) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.22) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.23) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.24) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.26) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.27) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.28) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.25) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15) requires uvicorn (>=0.23.2,<0.24.0).
So, because narrative-langserve depends on both uvicorn (^0.30) and langchain-cli (>=0.0.15), version solving failed.

I've tried updating to the latest uvicorn version, hoping it solves the issue. Is there any reason why langchain-cli takes dependency on an old version?

@Omega-Centauri-21
Copy link

Can you try collecting garbarge by calling the garbage collector explicitly (gc.collect() ) after handling requests to free up memory?

@siddicky
Copy link

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:

image The code is straightforward, I'm following examples from the docs. Chain definitio in `public_review.py`:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})


public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Hi, just wanted to confirm if this chain works as intended? I see you're using JsonOutputParser(pydantic_object=IssueList) however in your implementation, you're not using .with_structured_output() or bind_tools() to enforce this.

If the goal is to get json output you should specify the json_mode in .bind_tools or .with_structured_output

@michael81045
Copy link

Hello everyone,
Any suggestions or solutions? I'm having the same problem... ...
After running it 1500 times, my memory usage has remained on the peak.

@lukasugar
Copy link
Author

@michael81045 can you provide more context so we can see how our projects overlap, and precisely identify the issue?

What's your system info, what environment are you using?
Are you using LangSmith logging?

@pedrojrv
Copy link

pedrojrv commented Sep 4, 2024

Same issue on our side :O

@lukasugar
Copy link
Author

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

@lukasugar
Copy link
Author

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

@lukasugar lukasugar reopened this Sep 4, 2024
@eyurtsev
Copy link
Collaborator

eyurtsev commented Sep 13, 2024

Hi apologies I was on vacation and then working on the 0.3 release for langchain. I'll check what's constraining uvicorn (probably sse-starlette) and unpin.

@michael81045 , @pedrojrv , @lukasugar I still haven't seen a confirmation of what's actually causing the memory leak. Based on what I diagnosed above it was happening because of user misconfiguration of langsmith. (i.e., enabling the tracer, not sampling of traces etc). For folks seeing problems, can you confirm that it's not from a misconfiguration of langsmith?

@eyurtsev eyurtsev reopened this Sep 13, 2024
@eyurtsev
Copy link
Collaborator

eyurtsev commented Sep 13, 2024

langserve does not pin uvicorn directly, and based on sub-deps I don't see any uvicorn version pinning (e.g., from sse-starlette).

sse-starlette==1.8.2
├── anyio [required: Any, installed: 4.4.0]
│ ├── idna [required: >=2.8, installed: 3.8]
│ └── sniffio [required: >=1.1, installed: 1.3.1]
├── fastapi [required: Any, installed: 0.114.1]
│ ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.9.1]
│ │ ├── annotated-types [required: >=0.6.0, installed: 0.7.0]
│ │ ├── pydantic_core [required: ==2.23.3, installed: 2.23.3]
│ │ │ └── typing_extensions [required: >=4.6.0,!=4.7.0, installed: 4.12.2]
│ │ └── typing_extensions [required: >=4.6.1, installed: 4.12.2]
│ ├── starlette [required: >=0.37.2,<0.39.0, installed: 0.38.5]
│ │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0]
│ │ ├── idna [required: >=2.8, installed: 3.8]
│ │ └── sniffio [required: >=1.1, installed: 1.3.1]
│ └── typing_extensions [required: >=4.8.0, installed: 4.12.2]
├── starlette [required: Any, installed: 0.38.5]
│ └── anyio [required: >=3.4.0,<5, installed: 4.4.0]
│ ├── idna [required: >=2.8, installed: 3.8]
│ └── sniffio [required: >=1.1, installed: 1.3.1]
└── uvicorn [required: Any, installed: 0.23.2]
├── click [required: >=7.0, installed: 8.1.7]
└── h11 [required: >=0.8, installed: 0.14.0]


I suggest using pipdeptree to determine what's pinning the uvicorn version

@williamcodes
Copy link

Is this thread still active? I'm seeing the memory leak as well on the latest version of LangServe.

@lukasugar
Copy link
Author

@williamcodes can you please share your config/installed packages/repro steps, it could help in finding the cause? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants