Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support] Queens University Hub Outage #5332

Open
yuvipanda opened this issue Jan 6, 2025 · 6 comments
Open

[Support] Queens University Hub Outage #5332

yuvipanda opened this issue Jan 6, 2025 · 6 comments
Labels
support Issues that track Freshdesk tickets

Comments

@yuvipanda
Copy link
Member

The Freshdesk ticket link

https://2i2c.freshdesk.com/a/tickets/2645

Ticket request type

Something is not working

Ticket impact

🟥 Critical

Short ticket description

New nodes don't spin up, so no new users can log in

(Optional) Investigation results

No response

@yuvipanda yuvipanda added the support Issues that track Freshdesk tickets label Jan 6, 2025
@yuvipanda
Copy link
Member Author

First I couldn't login, but this was because I hadn't set up MFA initially. That was quickly resolved by Queens IT, and we could continue.

@yuvipanda
Copy link
Member Author

It looks like an upscaling operation on the existing user pool is 'stuck' somewhere, and nothing can happen on the cluster. I tried creating a new nodepool, which got stuck too. Same for trying to manually scale them.

I was able to abort the 'operation' with az aks nodepool operation-abort --nodepool-name usere8sv5 --cluster-name hub-cluster --resource-group 2i2c-jupyterhub-prod let's see if that helps

@yuvipanda
Copy link
Member Author

I created a VM with a public ip just to test that works, and it does. So this isn't some new security policy (at least on the surface).

I'm trying to now 'unwedge' this.

@yuvipanda
Copy link
Member Author

I tried to change the tier, but that too got stuck in 'updating' for far too long (~30min). Cancelling, and trying to upgrade master instead. If that doesn't get it unstuck, I'll bounce things back, as this now needs Azure support to be involved.

@yuvipanda
Copy link
Member Author

ya, the cluster is wedged af. I've responded in https://2i2c.freshdesk.com/a/tickets/2645 to ask them to involve Azure Support.

@yuvipanda
Copy link
Member Author

As a reminder, when something like this happened with UToronto a few years ago, things were down for like 48h until we just gave up and moved to CILogon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Issues that track Freshdesk tickets
Projects
None yet
Development

No branches or pull requests

1 participant