Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Size Growth Issue: Workflow Execution Logs and Retention Policy #12991

Open
5 tasks done
neeleshsoftinator opened this issue Jan 23, 2025 · 1 comment
Open
5 tasks done
Labels
🌊 feat:workflow Workflow related stuff.

Comments

@neeleshsoftinator
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

We are experiencing significant database growth in a deployment of Dify running on Docker Compose with a PostgreSQL backend. Upon analysis, we identified that some tables are disproportionately large and contributing to the growth.

Here are the top observations:

The workflow_node_executions table is the largest, with a data size of 997 MB and a total size of 1.25 GB.
Other tables such as workflows and document_segments also show significant size.

Image

✔️ Expected Behavior

No response

❌ Actual Behavior

We have Celery tasks scheduled to clean up some tables, but they might not cover all the problematic tables (e.g., workflow_node_executions). As a result, the database size continues to grow.
Environment
Deployment Type: Docker Compose
Database: PostgreSQL
Version: (Specify your Dify version)

Copy link

dosubot bot commented Jan 23, 2025

The workflow_node_executions table in Dify is indeed a significant contributor to database size growth, as it stores detailed execution logs for workflow nodes. Unfortunately, there is no specific retention policy or configuration in Dify to manage or limit the size of this table directly [1][2][3].

For managing database growth, you might consider implementing custom cleanup tasks. While Dify does have some Celery tasks for database cleanup, they may not target the workflow_node_executions table specifically [4]. You can extend the existing Celery tasks or create new ones to periodically delete old records from this table based on your retention requirements.

Additionally, monitoring the execution of these tasks can be done by configuring the Celery worker service to log detailed information, which can help in ensuring that cleanup tasks are running as expected [5][6].

If you need to optimize the database further, consider reviewing the indexes on the workflow_node_executions table to ensure they are efficient, although specific index optimizations were not found in the current context.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🌊 feat:workflow Workflow related stuff. label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌊 feat:workflow Workflow related stuff.
Projects
None yet
Development

No branches or pull requests

1 participant