Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSS] Escalation mobile push fail with "HTTP client error 403" while test notif works #3206

Open
bmalynovytch opened this issue Oct 27, 2023 · 13 comments

Comments

@bmalynovytch
Copy link

What went wrong?

What happened:

  • Created an alert group
  • Escalation chain triggered a mobile push notification
  • Timeline shows "failed to notify ... by mobile push important"
oncall-mobile-push-failed

What did you expect to happen:

  • Created an alert group
  • Escalation chain triggered a mobile push notification
  • Get notified

How do we reproduce it?

  1. Configure Grafana and OnCall OSS
  2. Plug to Grafana Cloud on EU region
  3. Create alert group and escalation chain to trigger mobile push notification
  4. 💥

Grafana OnCall Version

v1.3.47 (Docker)

Product Area

Alert Flow & Configuration, Mobile App

Grafana OnCall Platform?

Kubernetes

User's Browser?

No response

Anything else to add?

Test notifications are working properly from the user's profile, but logs are seen in "engine" while failing notifications fail in "celery".
Their might be something wrong with GRAFANA_CLOUD_ONCALL_API_URL not being properly set/used in celery, which makes it auth on the default Grafana Cloud platform instead of https://oncall-prod-eu-west-0.grafana.net/oncall as it should.

@bmalynovytch
Copy link
Author

I also had troubles to configure GRAFANA_CLOUD_ONCALL_API_URL using env variables and needed to enable FEATURE_LIVE_SETTINGS_ENABLED.
(See #1479)

@vstpme vstpme self-assigned this Nov 6, 2023
@vstpme
Copy link
Member

vstpme commented Nov 6, 2023

Hi @bmalynovytch, thank you for opening an issue! I'm trying to reproduce this now, could you please tell more about how you've passed the GRAFANA_CLOUD_ONCALL_API_URL env variable into your deployment?

Their might be something wrong with GRAFANA_CLOUD_ONCALL_API_URL not being properly set/used in celery, which makes it auth on the default Grafana Cloud platform instead of https://oncall-prod-eu-west-0.grafana.net/oncall as it should.

Have you passed this env variable both to release-oncall-engine and release-oncall-celery k8s deployments?

@bmalynovytch
Copy link
Author

bmalynovytch commented Nov 6, 2023

Hi @vadimkerr
I deployed using the helm chart, which requires to provide the env variable only once.
It's then pushed to both deployments and I can confirm that the env is available in the shell of Celery.

@bmalynovytch
Copy link
Author

Sorry @vadimkerr
I was wrong, the setting is provided using LiveSetting, not env.

@bmalynovytch
Copy link
Author

So ... I tried providing the token twice, using LiveSettings AND env variable.
To do so, I needed to reset the token (hadn't kept a copy of the previous).
Now, notifications work properly 🤷

I tried removing the env variable, I get the 403 error again.
The magic is the env variable which makes things work.

There seem to be a big mess with LiveSettings and env variables 😞

@vstpme
Copy link
Member

vstpme commented Nov 7, 2023

Glad it's now working for you @bmalynovytch! I'll try to reproduce this and see if there's something we can do about it.

@TomasHradecky
Copy link

TomasHradecky commented Dec 7, 2023

@bmalynovytch can you please share what exactly and how have you configured GRAFANA_CLOUD_ONCALL_API_URL ? i have oncall in k8s cluster and with all what i've tried I still have token is invalid

@bmalynovytch
Copy link
Author

bmalynovytch commented Dec 8, 2023

@bmalynovytch can you please share what exactly and how have you configured GRAFANA_CLOUD_ONCALL_API_URL ? i have oncall in k8s cluster and with all what i've tried I still have token is invalid

The trick is that you need to provide token twice, one with env variables and another one with an override in LiveSettings.
There seem to be some code using the override properly while not using the env variable, and another portion using only the env variable.

In the helm values, here's the relevant section :

env:
  GRAFANA_CLOUD_ONCALL_API_URL: https://oncall-prod-eu-west-0.grafana.net/oncall
  GRAFANA_CLOUD_ONCALL_TOKEN: ....

@TomasHradecky
Copy link

yesterday i've tried to add env on grafana instead of oncall on woalla second clikc on connect oncall to cloud go through. So if anyone has the same issue as me try to add this for Grafana

env:
  ONCALL_CLOUD_API_URL: "https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations"
 

@bmalynovytch thanks for info, I will try it to find what is the difference between that

@Patrick-Remy
Copy link

@TomasHradecky is this how you solved it? This is currently not working for us, we still get 404 (403 if *_CLOUD_ONCALL_API_URL is unset) if we try to set the token via grafana admin > plugins UI

services:
  grafana:
    environment:
      ONCALL_CLOUD_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations' # we also tried without /api/v1/integrations here
  
  on-call-engine:
    environment:
      FEATURE_LIVE_SETTINGS_ENABLED: 'true' # should be default anyway
      GRAFANA_API_URL: 'http://grafana:3000' # local network url
      GRAFANA_CLOUD_ONCALL_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall'
      GRAFANA_CLOUD_ONCALL_TOKEN: '<our-cloud-token>'
      GRAFANA_CLOUD_NOTIFICATIONS_ENABLED: 'true'
     

@TomasHradecky
Copy link

TomasHradecky commented Jan 9, 2024

@TomasHradecky is this how you solved it? This is currently not working for us, we still get 404 (403 if *_CLOUD_ONCALL_API_URL is unset) if we try to set the token via grafana admin > plugins UI

services:
  grafana:
    environment:
      ONCALL_CLOUD_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations' # we also tried without /api/v1/integrations here
  
  on-call-engine:
    environment:
      FEATURE_LIVE_SETTINGS_ENABLED: 'true' # should be default anyway
      GRAFANA_API_URL: 'http://grafana:3000' # local network url
      GRAFANA_CLOUD_ONCALL_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall'
      GRAFANA_CLOUD_ONCALL_TOKEN: '<our-cloud-token>'
      GRAFANA_CLOUD_NOTIFICATIONS_ENABLED: 'true'
     

@Patrick-Remy
is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL.
Just for sure, check on grafana cloud portal in oncall settings that oncall-prod-eu-west-0 is right domain for your oncall.

@Patrick-Remy
Copy link

is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL.

This was the issue, in compose-setup, only the public accessible endpoint worked! Afterwards it was required to save the token twice in the settings. The first time it resulted in 403 (?!), and just because of frustration we pressed the button again and voilà everything was connected.

This is so extremely weird and buggy, thanks a lot for your help!

@TomasHradecky
Copy link

is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL.

This was the issue, in compose-setup, only the public accessible endpoint worked! Afterwards it was required to save the token twice in the settings. The first time it resulted in 403 (?!), and just because of frustration we pressed the button again and voilà everything was connected.

This is so extremely weird and buggy, thanks a lot for your help!

happy to help, completely same behavior, just frustration helped to make it work.
Really do not understand why grafana is accessible only through public domain from oncall. Even if kube-dns records were OK on my cluster and oncall know the path to grafana, only public endpoint is working solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants