Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guacamole user sync container failing and not syncing Entra with user db #2350

Open
5 tasks done
helendduncan opened this issue Jan 8, 2025 · 6 comments
Open
5 tasks done
Labels
bug Problem when deploying a Data Safe Haven.

Comments

@helendduncan
Copy link

helendduncan commented Jan 8, 2025

✅ Checklist

  • I have searched open and closed issues for duplicates.
  • This is a problem observed when deploying a Data Safe Haven.
  • I can reproduce this with the latest version.
  • I have read through the documentation.
  • This isn't an open-ended question (open a discussion if it is).

💻 System information

  • Operating System:
  • Data Safe Haven version:

📦 Packages

List of packages
Paste list of packages here

🚫 Describe the problem

Yesterday 55 users were added to the sbox123 SRE, one of the new users tried to log in and cannot see any connections.

Users were added to the User group via Entra in batches of 10 (manually) over the course of ~ 20-30 minutes.

Users added prior to yesterday have access.

Checking the status of the guacamole user sync container we can see it's terminating and restarting (presumably since yesterday - 130 times as of 5PM wednesday)

DB never finishes synchronising and appears to timeout and restart before it can

From: container logs for guacamole user sync container (50+ users assigned in Entra)

sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r)

🌳 Log messages

Relevant log messages

From: container logs for guacamole user sync container (50+ users assigned in Entra)

sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r)

♻️ To reproduce

@helendduncan helendduncan added the bug Problem when deploying a Data Safe Haven. label Jan 8, 2025
@JimMadge
Copy link
Member

JimMadge commented Jan 9, 2025

@jemrobinson any ideas?

@JimMadge
Copy link
Member

JimMadge commented Jan 9, 2025

Possible that this is a larger number of users than we have tried to sync before and that is causing problems. Time out suggests to me the server we are querying is stalling or not working correctly.

As the error happens in an SqlAlchemy command, presumably this is when interacting with the database of Guacamole users/connections.

@jemrobinson
Copy link
Member

jemrobinson commented Jan 9, 2025

Can we get some more log messages? This would be helpful to narrow down where this is happening.

My guess is that this is due to a missing "close connection" in one of the files here (https://github.com/alan-turing-institute/guacamole-user-sync/blob/main/guacamole_user_sync/postgresql). If anyone has time to take a look, it should be a relatively quick fix.

@helendduncan
Copy link
Author

After adding and removing users via Entra - it appears the maximum number of users is 10

@JimMadge
Copy link
Member

JimMadge commented Jan 9, 2025

Default max number of connections is 5+5 so that makes sense.

The most likely thing seems that guacamole-user-sync is making a connection for each user, which risks going over the limit.

@helendduncan
Copy link
Author

After co-working hot fix 55 users have successfully been added and the sync container is running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Problem when deploying a Data Safe Haven.
Projects
None yet
Development

No branches or pull requests

3 participants