Race condition causing celerybeat and celeryworker to fail #2727

malefice · 2020-08-06T08:56:23Z

What happened?

There is a race condition where celerybeat and/or celeryworker services will fail to start if the django-celery-beat migrations have not run. Bringing up the stack up again once those migrations have run will resume normally as expected.

What should've happened instead?

While this can only occur under specific conditions (usually first run), it is best to have a guarantee that all services will be up and running 100% out-of-the-box. The celerybeat and/or celeryworker services should have a health check in its entrypoint script prior to running its command script. I personally have some code for this if you guys are interested, but I want to hear your thoughts first.

Steps to reproduce

Tested on Ubuntu 18.04.4 LTS, Docker Engine version 19.03.12, docker-compose version 1.17.1

Create a new project from this template as you normally would but with docker and celery support.
Build and bring the whole stack up. The race condition usually shows up for me on the first run.
If it did not trigger the first time, then drop the database volume, and bring up the stack. You may need to repeat this step multiple times.

Some screenshots

The text was updated successfully, but these errors were encountered:

browniebroke · 2020-08-06T09:23:12Z

Thanks for the great explanations in this bug report. It would be nice to fix it, yes. You're saying you have some code to fix it, I'd be interested in knowing more about your solution?

malefice · 2020-08-06T10:08:56Z

I am currently using manage.py showmigrations app_name to do this. It lists down all the migrations for the app, and a migration file has not yet been applied if [ ] is present like so:

To ensure that all the migrations for django_celery_beat have run properly, we can grep the output of the command. This was how I did it.

django_celery_migrations_complete() {
  count=$(python manage.py showmigrations django_celery_beat | grep '\[ \]' | wc -l)
  if [[ $count -eq 0 ]]; then
      return 0
  fi
    return 1
}

until django_celery_migrations_complete; do
  >&2 echo 'Waiting for Django Celery migrations to complete...'
  sleep 5
done
>&2 echo 'Django Celery migrations have been applied'

The only thing I cannot avoid is a little code duplication. Since this requires database access, the excerpt above must be preceded by the same postgres health check present in the django service's entrypoint script. I am not sure how to keep things DRY, so I implemented it such that I have a separate entrypoint script for the django service, and another entrypoint script for the celery services like so:

# Inside docker-compose file:

  celerybeat:
    <<: *django
    image: celerybeat
    depends_on:
      - redis
      - postgres
    ports: []
    entrypoint: /celery-entrypoint
    command: /start-celerybeat

  celeryworker:
    <<: *django
    image: celeryworker
    depends_on:
      - redis
      - postgres
    ports: []
    entrypoint: /celery-entrypoint
    command: /start-celeryworker

browniebroke · 2020-08-07T10:19:26Z

Just to clarify one small thing, are you having this issue locally on in production?

As a reminder/notes of things work locally:

We automatically run the migration from the django container:

https://github.com/pydanny/cookiecutter-django/blob/4978aa79135374c36ab1b9bcc07c6e4a3364eae8/%7B%7Bcookiecutter.project_slug%7D%7D/compose/local/django/start#L8
We don't have any dependencies between the django and celery services in the local.yml file:

https://github.com/pydanny/cookiecutter-django/blob/1b5997466c93339d411d4a11c7d733d4d2250fbd/%7B%7Bcookiecutter.project_slug%7D%7D/local.yml#L14-L18

https://github.com/pydanny/cookiecutter-django/blob/1b5997466c93339d411d4a11c7d733d4d2250fbd/%7B%7Bcookiecutter.project_slug%7D%7D/local.yml#L74-L79

I was initially thinking to make celery services depend on django in the docker-compose file. Not sure of the drawbacks of such solution though.

Another idea I've read about was to declare another service just for migrations and make all others depend on it.

The depends_on from docker-compose might not be enough though. I think that might start the celery containers too soon still, in which case we'll need to run something inside the start Celery scripts to wait for migrations, a bit like you do.

One other drawback I foresee with the solution is that it depends on the output of the presentational showmigrations command, which isn't ideal. If the Django project decides to reformat the output, this will break.

I've had some issues in the past when trying to parse that output, it was fine for my toy project, but not something I'm too keen to add to a popular open source project.

These are just my thoughts from a quick look, let me know if I've missed anything. Other ideas are welcome of course.

malefice · 2020-08-07T12:05:18Z

Just to clarify one small thing, are you having this issue locally on in production?

I am running this locally, and I am aware that migrations will be run automatically on the django container for local development.

Another idea I've read about was to declare another service just for migrations and make all others depend on it.

The depends_on from docker-compose might not be enough though. I think that might start the celery containers too soon still, in which case we'll need to run something inside the start Celery scripts to wait for migrations, a bit like you do.

I have tried using depends_on, and it does not work, because according to the docker docs, depends_on controls only the sequence in which the services are started by docker-compose, and it does not care about the results. On that note, delegating a separate container to handle migrations upon which the other django-based containers will depend will also not work. I tried that approach anyway just to gather these screenshots:

In the screenshots, django-bootstrap is the container responsible for running migrations, and the other django-ish containers "depends" on it in the compose file. You can clearly see that the start scripts do not run in the order one would expect. Only the "initial starting" of the containers will be in dependency order.

One other drawback I foresee with the solution is that it depends on the output of the presentational showmigrations command, which isn't ideal. If the Django project decides to reformat the output, this will break.

The project already pins a Django version, so in my opinion, that becomes less of an issue. An alternative is to issue raw SQL queries to verify that all the migrations for django-celery-beat have run. That is much more complicated VS fixing a much simpler parsing issue that may occur only when you change the pinned Django version, and based on this, it does not happen often, if at all at least for showmigrations.

It is also an alternative to just not fix this race condition at all. Just document it as such, and tell users to first build, then run docker-compose -f local.yml run --rm django python manage.py migrate, then bring the services up as part of first time setup.

Andrew-Chen-Wang · 2020-08-19T23:42:29Z

@malefice That's pretty much the reason for the Travis implementation for Docker. This initial migration issue that I wanted to fix (obviously failed) could avoid this condition if celerybeat had its own entrypoint file that we can specify in the compose file.

xjlin0 · 2022-01-16T15:09:35Z

if using docker-compose 1.29 or later, will the following condition work?

  depends_on: service_completed_successfully

Also, we can always run migration before starting the server.

browniebroke · 2024-01-20T14:24:07Z

Someone tried to fix that issue in #4459 and after taking further look at this, I'm not convinced it's a bug worth fixing. The error will happen once, the first time you start you server, and will pretty much go away for ever. The error being returned is quite clear, and fixing it properly would involve hiding the problem.

I'm marking it as won't fix, unless someone comes up with a simple and foolproof solution...

github-actions · 2024-05-13T17:31:15Z

As discussed, we won't be implementing this. Automatically closing.

browniebroke added the bug label Aug 6, 2020

browniebroke mentioned this issue Jul 11, 2023

include django-health-check #4428

Open

goatwu1993 mentioned this issue Jul 14, 2023

Fix celery beat startup without django_celery_beat tables #4459

Closed

2 tasks

browniebroke mentioned this issue Jul 26, 2023

cookiecutter-django template django.db.utils.ProgrammingError #4485

Closed

browniebroke added the docker label Nov 29, 2023

browniebroke added the wontfix Wait a 10 days days and automatically close label Jan 20, 2024

browniebroke added solved Wait a bit and close the issue if no response assuming it was solved. and removed solved Wait a bit and close the issue if no response assuming it was solved. labels Jan 20, 2024

github-actions bot removed the wontfix Wait a 10 days days and automatically close label Jan 25, 2024

browniebroke added the wontfix Wait a 10 days days and automatically close label May 13, 2024

github-actions bot closed this as completed May 13, 2024

browniebroke closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition causing celerybeat and celeryworker to fail #2727

Race condition causing celerybeat and celeryworker to fail #2727

malefice commented Aug 6, 2020 •

edited

Loading

browniebroke commented Aug 6, 2020

malefice commented Aug 6, 2020

browniebroke commented Aug 7, 2020

malefice commented Aug 7, 2020 •

edited

Loading

Andrew-Chen-Wang commented Aug 19, 2020

xjlin0 commented Jan 16, 2022 •

edited

Loading

browniebroke commented Jan 20, 2024

github-actions bot commented May 13, 2024

Race condition causing celerybeat and celeryworker to fail #2727

Race condition causing celerybeat and celeryworker to fail #2727

Comments

malefice commented Aug 6, 2020 • edited Loading

What happened?

What should've happened instead?

Steps to reproduce

Some screenshots

browniebroke commented Aug 6, 2020

malefice commented Aug 6, 2020

browniebroke commented Aug 7, 2020

malefice commented Aug 7, 2020 • edited Loading

Andrew-Chen-Wang commented Aug 19, 2020

xjlin0 commented Jan 16, 2022 • edited Loading

browniebroke commented Jan 20, 2024

github-actions bot commented May 13, 2024

malefice commented Aug 6, 2020 •

edited

Loading

malefice commented Aug 7, 2020 •

edited

Loading

xjlin0 commented Jan 16, 2022 •

edited

Loading