Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana Oncall Plugin not connected #5100

Open
Ennakin opened this issue Sep 30, 2024 · 33 comments
Open

Grafana Oncall Plugin not connected #5100

Ennakin opened this issue Sep 30, 2024 · 33 comments

Comments

@Ennakin
Copy link

Ennakin commented Sep 30, 2024

What went wrong?

What happened:

  • After another update to 1.9.30 version of OnCall Plugin I get 500 code every time I go to Oncall section.
    Image
    The configuration looks ok thoughImage

I have a hobby mode of grafana running in containers. The engine version is also 1.9.30, grafana version is 11.3.0-76679. The integrations is still working, alerts are still being sent. There is an error In grafana container logs: " level=error msg="Request Completed" method=POST path=/api/ds/query status=500"
What did you expect to happen:

  • Have an access to OnCall section pages of garfana interface

How do we reproduce it?

  1. Open Grafana and go to OnCall section
  2. Now click any page
  3. Wait for the browser to crash. Error message says: "Grafana Oncall Plugin not connected"

Grafana OnCall Version

v.1.9.30

Product Area

Helm/Kubernetes/Docker

Grafana OnCall Platform?

Docker

User's Browser?

Google Chrome

Anything else to add?

No response

@felipevacar
Copy link

I am getting the same error :(

curl -X GET 'https://my-user:[email protected]/api/plugins/grafana-oncall-app/resources/plugin/status'

"error setting up request headers: failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OrgUser "

Grafana
v10.2.2

Grafana OnCall Version
v.1.9.31

@Ennakin
Copy link
Author

Ennakin commented Oct 1, 2024

I should mention that i have self-hosted grafana with multiple organizations, so enabling the externalServiceAccounts didn't work for me (as I learned from this https://github.com/grafana/grafana-plugin-examples/blob/main/examples/app-with-service-account/README.md).
My error is "failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={"message":"Unlicensed","traceID":""}". I'm not running Grafana in the Enterprise mode. What's with the license?

@mderynck
Copy link
Contributor

mderynck commented Oct 2, 2024

@Ennakin Multiple organizations in Grafana is not supported by OnCall.

Is accessControlOncall feature flag enabled? if you are not running enterprise it should not be on. This is what is telling OnCall to check for RBAC permissions.

@Ennakin
Copy link
Author

Ennakin commented Oct 2, 2024

@Ennakin Multiple organizations in Grafana is not supported by OnCall.

Is accessControlOncall feature flag enabled? if you are not running enterprise it should not be on. This is what is telling OnCall to check for RBAC permissions.

accessControlOncall flag isn't enabled nor through config file, nor through docker-compose envs. RBAC section in grafana.ini looks like this Image
the toggler section:
Image

so should the externalServiceAccounts toggler be enabled? If so I get "PluginAppClientSecret not set in config" error.

@mderynck
Copy link
Contributor

mderynck commented Oct 2, 2024

externalServiceAccounts should be on accessControlOnCall should be off, may want to double check the feature_toggles section in the UI under Administration->General->Settings as well.

@Ennakin
Copy link
Author

Ennakin commented Oct 2, 2024

externalServiceAccounts should be on accessControlOnCall should be off, may want to double check the feature_toggles section in the UI under Administration->General->Settings as well.

Thank you! I checked settings area, externalServiceAccounts is only feature that is on. What PluginAppClientSecret is?

@mderynck
Copy link
Contributor

mderynck commented Oct 2, 2024

PluginAppClientSecret is the token for the external service account associated with the Plugin.
Under Administration->Users and Access->Service accounts there should be one called extsvc-grafana-oncall-app and it should have 1 token.

@Ennakin
Copy link
Author

Ennakin commented Oct 2, 2024

there is only sa-autogen-OnCall account wich was generated a few month ago with the installation of grafana-oncall. and it has 1 token

@mderynck
Copy link
Contributor

mderynck commented Oct 2, 2024

Try going to Administration->Plugins and data->Plugins Grafana OnCall make sure there is an IAM tab on that screen and also check in the grafana log file if there is any errors on startup regarding the plugin. That service account should get created when the plugin is loaded.

@Ennakin
Copy link
Author

Ennakin commented Oct 3, 2024

IAM tab is in place. I still get 'msg="Request Completed" method=GET path=/api/plugins/grafana-oncall-app/resources/plugin/status status=500 ... msg="Error making sync request" error="error getting settings from context: PluginAppClientSecret not set in config "' in grafana logs

@Ennakin
Copy link
Author

Ennakin commented Oct 7, 2024

Is there any chance I can use a post method to create this service account?

Try going to Administration->Plugins and data->Plugins Grafana OnCall make sure there is an IAM tab on that screen and also check in the grafana log file if there is any errors on startup regarding the plugin. That service account should get created when the plugin is loaded.

@Ennakin
Copy link
Author

Ennakin commented Oct 7, 2024

And if I switched to grafana-enterprise would I need a license to use oncall plugin?

@mderynck
Copy link
Contributor

mderynck commented Oct 7, 2024

Is there any chance I can use a post method to create this service account?

This service account can't be created by the user it should be created automatically by the plugin.

And if I switched to grafana-enterprise would I need a license to use oncall plugin?

You need a license to use all the features of grafana-enterprise, oncall does not have a license it just conforms to the Grafana version it is installed on.

@Ennakin
Copy link
Author

Ennakin commented Oct 8, 2024

In the enterprise mode with 'enable = externalServiceAccounts, accessControlOncall' setting I still get 'PluginAppClientSecret not set in config' error.
Please let me know, if I missed anything

@Ennakin
Copy link
Author

Ennakin commented Oct 9, 2024

This service account can't be created by the user it should be created automatically by the plugin.

Is it supposed to work even without kubernetes? And why doesn't oncall plugin have a permission to create service account in IAM section? Image

@ced455
Copy link

ced455 commented Oct 24, 2024

Hello I have the exact same issue, the most disturbing par is that it was working but as soon restarted grafana the oncall pages where displaying "Plugin not connected".

I tried to install and uninstall using the API, the UI and ansible without success.

Grafana Version 11.3.0
Grafana Oncall Version v1.11.5

Grafana logs when I hit the retry button :

logger=context userId=1 orgId=1 uname=admin t=2024-10-24T17:12:40.458090452+02:00 level=info msg="Request Completed" method=GET path=/api/live/ws status=-1 remote_addr=redacted_ip time_ms=4 duration=4.800207ms size=0 referer= handler=/api/live/ws status_source=server
logger=context userId=10 orgId=1 uname=sa-1-sa-autogen-oncall t=2024-10-24T17:12:40.59189954+02:00 level=info msg="Request Completed" method=GET path=/api/plugins/grafana-incident-app/settings status=404 remote_addr=redacted_ip time_ms=11 duration=11.167478ms size=64 referer= handler=/api/plugins/:pluginId/settings status_source=server
logger=plugin.grafana-oncall-app t=2024-10-24T17:12:40.593068999+02:00 level=error msg="getting incident plugin settings" error="request did not return 200: 404"
logger=context userId=10 orgId=1 uname=sa-1-sa-autogen-oncall t=2024-10-24T17:12:40.607834375+02:00 level=info msg="Request Completed" method=GET path=/api/plugins/grafana-labels-app/settings status=404 remote_addr=redacted_ip time_ms=8 duration=8.283405ms size=64 referer= handler=/api/plugins/:pluginId/settings status_source=server
logger=plugin.grafana-oncall-app t=2024-10-24T17:12:40.608603801+02:00 level=error msg="getting labels plugin settings" error="request did not return 200: 404"
logger=plugin.grafana-oncall-app t=2024-10-24T17:12:40.612673313+02:00 level=info msg=GetUser user="map[Email:admin@localhost Login:admin Name:admin Role:Admin]"
logger=context userId=10 orgId=1 uname=sa-1-sa-autogen-oncall t=2024-10-24T17:12:40.641276486+02:00 level=info msg="Request Completed" method=GET path=/api/access-control/users/1/permissions status=404 remote_addr=redacted_ip time_ms=8 duration=8.404385ms size=24 referer= handler=notfound status_source=server
logger=plugin.grafana-oncall-app t=2024-10-24T17:12:40.641949991+02:00 level=error msg="Error getting user" error="failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n"
logger=plugin.grafana-oncall-app t=2024-10-24T17:12:40.642178723+02:00 level=error msg="Error validating oncall plugin settings" error="error setting up request headers: failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n "
logger=context userId=1 orgId=1 uname=admin t=2024-10-24T17:12:40.642738448+02:00 level=error msg="Request Completed" method=GET path=/api/plugins/grafana-oncall-app/resources/plugin/status status=500 remote_addr=redacted_ip time_ms=77 duration=77.752268ms size=174 referer=https://REDACTED/a/grafana-oncall-app/alert-groups handler=/api/plugins/:pluginId/resources/* status_source=downstream

Edit, update to v1.11.5, same issue

@Ennakin
Copy link
Author

Ennakin commented Oct 25, 2024

we rolled back to 11.1.1 of grafana, 1.9.30 of oncall and 1.9.26 of oncall-plugin. this is the only configuration it works more or less fine.

@Ennakin
Copy link
Author

Ennakin commented Oct 25, 2024

the second I wrote the prev comment we faced another issue:

error setting up request headers: failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OrgUser body={"message":"Unauthorized","traceID":""}

every time i'm trying to connect to the plugin

@seebag
Copy link

seebag commented Oct 28, 2024

Got the same problem with

  • Grafana OSS v11.3.0
  • Grafana oncall 1.11.5
  • Grafana plugin 1.11.5

I add the following features as describe in the thread :

[feature_toggles]
enable = externalServiceAccounts
accessControlOnCall = false

I got the IAM tab in the plugin setting.

But I got no success

logger=plugin.grafana-oncall-app t=2024-10-28T16:09:01.132349586+01:00 level=error msg="Error getting settings from context" error="PluginAppClientSecret not set in config"

@RobinFrcd
Copy link

RobinFrcd commented Oct 28, 2024

Same situation with PluginAppClientSecret not set in config here, GF_FEATURE_TOGGLES_ENABLE=externalServiceAccounts. I tried to remove and reinstall the plugin, but it does not create a Service accounts automatically. If we can't create extsvc-grafana-oncall-app manually, how should we proceed ?

As other stated, the create action has no scope here
Image

EDIT: Downgrading from grafana v11.3 to v11.2.3, deleting the plugin and re-installing it does the job. The issue is clearly with grafana v11.3.

@bpedersen2
Copy link

grafana 11.3.0 has been disabled in the e2e test currently (#5207 ) so I guess oncall is currently not compatible.

Looking at the changes in grafana 11.3.0, grafana/grafana#93849 seem like a possible source of the problem.

@sunshine-luganodes
Copy link

sunshine-luganodes commented Nov 1, 2024

since grafana has gained rbac support for all editions, i assum its safe to work on accessControlOnCall for OSS ?
https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v11-3/#developers-support-rbac-in-plugins

@bck01215
Copy link
Contributor

bck01215 commented Nov 1, 2024

Also affected in the upgrade. In the future, release notes should indicate breaking changes such as major auth re configurations.

Grafana set accessControlOnCall to GA/on by default which I assume is what's caused all these issues. In the docs I haven't found how to disable feature flags that are GA.

@Kuzbekov
Copy link

Kuzbekov commented Nov 4, 2024

disabling feature toggle "accessControlOnCall" helped
for helm deployment of grafana you need to add value:

grafana.ini:
  feature_toggles:
    accessControlOnCall: 'false'

@tarvip
Copy link

tarvip commented Nov 21, 2024

disabling feature toggle "accessControlOnCall" helped for helm deployment of grafana you need to add value:

grafana.ini:
  feature_toggles:
    accessControlOnCall: 'false'

It helped, but it was not enough.

I had to add also GF_AUTH_MANAGED_SERVICE_ACCOUNTS_ENABLED=true env variable.
https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#managed_service_accounts_enabled

@Smana
Copy link

Smana commented Dec 2, 2024

Hi I finally managed to get OnCall to work using these commands here:

curl -X POST 'https://admin:<admin_password>@<grafana_host>/api/plugins/grafana-oncall-app/settings' -H "Content-Type: application/json" -d '{"enabled":true, "jsonData":{"stackId":5, "orgId":100, "onCallApiUrl":"http://oncall-engine:8080/", "grafanaUrl":"http://<grafana_address>/"}}'
 curl -X POST 'https://admin:<admin_password>@<grafana_host>/api/plugins/grafana-oncall-app/resources/plugin/install'

Check that everything works properly

curl -X GET 'https://admin:<admin_password>@<grafana_host>/api/plugins/grafana-oncall-app/resources/plugin/status' | jq

However I'd like to avoid having to run commands. Could you please tell me how to configure it programmatically? (Using the Helm chart?)

@maffelbaffel
Copy link

disabling feature toggle "accessControlOnCall" helped for helm deployment of grafana you need to add value:

grafana.ini:
  feature_toggles:
    accessControlOnCall: 'false'

It helped, but it was not enough.

I had to add also GF_AUTH_MANAGED_SERVICE_ACCOUNTS_ENABLED=true env variable. https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#managed_service_accounts_enabled

This is how my configuration looks:

feature_toggles:
  enable: 'correlations autoMigrateOldPanels traceQLStreaming externalServiceAccounts'
  accessControlOnCall: 'false'

I also had to manually set GF_AUTH_MANAGED_SERVICE_ACCOUNTS_ENABLED=true in my helm chart for it to work.

@nthtrung09it
Copy link

@maffelbaffel I followed your guide and it works perfectly. Thank you very much.

Do you encounter the parsing error in Grafana OnCall Insights dashboard?
Image

@PovilasV1
Copy link

Having a similar issue:

curl https://$GRAFANA_AUTH@$GRAFANA_URL/api/plugins/grafana-oncall-app/resources/plugin/status
error setting up request headers: failed to parse JSON response: invalid character '<' looking for beginning of value body=<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>

Applied all of the suggestions, but still no luck. Issues started when upgraded from v11.2.0 to v11.3.1. Downgrading now also does not fix the issue, currently running the latest Grafana and Oncall.
We have only 1 organization in Grafana.
What is also interesting is that the service account and token are created, but never used:
Image

@gabriel-suela
Copy link
Contributor

@maffelbaffel I followed your guide and it works perfectly. Thank you very much.

Do you encounter the parsing error in Grafana OnCall Insights dashboard? Image

I have the same error here, did you find a fix?

@maffelbaffel
Copy link

@maffelbaffel I followed your guide and it works perfectly. Thank you very much.
Do you encounter the parsing error in Grafana OnCall Insights dashboard? Image

I have the same error here, did you find a fix?

This is because the queries in this dashboard are flawed.

round(delta(sum($alert_groups_total{slug=~"$instance", team=~"$team", integration=~"$integration", service_name=~"$service_name"})[$__range:])) >= 0

The dollar sign at $alert_groups_total should not be there. Re-Importing has no effect for me.
I have just copied the dashboard and fixed the errors by removing the dollar sign.

@gabriel-suela
Copy link
Contributor

@maffelbaffel I followed your guide and it works perfectly. Thank you very much.
Do you encounter the parsing error in Grafana OnCall Insights dashboard? Image

I have the same error here, did you find a fix?

This is because the queries in this dashboard are flawed.

round(delta(sum($alert_groups_total{slug=~"$instance", team=~"$team", integration=~"$integration", service_name=~"$service_name"})[$__range:])) >= 0

The dollar sign at $alert_groups_total should not be there. Re-Importing has no effect for me. I have just copied the dashboard and fixed the errors by removing the dollar sign.

in settings > variables > alert_groups_total? or did you change on every visual, because change on every visual for me did no fix it (instance, team, integration etc) still have parser problems and change the variable alert_groups make the dashboard return no data

@maffelbaffel
Copy link

Oh wait, my memory tricked me here I guess. It was quite some time ago when I fixed that in my setup.

I think the error occurs when the alert_groups_total Prometheus metric does not exist.
Can you check if the metric exists in your case? In my case these values are returned:
Image

Are you using the Helm chart? If yes, have you enabled the exporter?

oncall:
  exporter:
    enabled: true

You also need to create a ServiceMonitor manually because the helm chart does not offer the functionality to create one yet. See this for examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests