You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
We use Grafana's provisioning to provision alerts from a private Git repository. Recently, since around 1.8.x, my collegues have reported from time to time, that alerts just completely re-appear and are now duplicates. Taking a look into the database itself, it seems that the alert group is completely re-created but matches the previous alerts.
Side-note: I "inherited" this setup from a former collegue who had less than five days to attempt to teach me how to manage and maintain this - and aside from me, nobody here really knows how to work with this kind of software (including MySQL and such), especially since it is all deployed in a Kubernetes (k3s) cluster. So, tl;dr: I have to somehow administer this all on my own. With no prior experience.
As you can see in the output, the same alert group gets recreated several times.
What did you expect to happen:
We expected that when adding a note to the alert group and choosing either acknowledge or Resolve, that the group would keep this state untill a new alert of the matching criteria is determined.
How do we reproduce it?
I am unfortunately not aware how this is reproducible - all I do know is that this issue started to happen after the first time I had upgraded OnCall to 1.8.x. Before each upgrade, I do read all the changelogs prior but found nothing - neither here nor in Grafanas' - that would indicate to me something that would have to be changed. Apologies for that. Though due to my visual impairment, I would not be surprised if I overlooked something - it happens sometimes...
Grafana OnCall Version
1.9.20
Product Area
Alert Flow & Configuration
Grafana OnCall Platform?
Kubernetes
User's Browser?
Happens in Chrome, Firefox and Edge and Brave.
Anything else to add?
We deploy Grafana and OnCall in separate deployments:
root@senst-sv-k3s01 ~# kubectl get -n grafana all
NAME READY STATUS RESTARTS AGE
pod/grafana-5689bb9b5c-ssmc6 4/4 Running 1 (91m ago) 91m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/grafana ClusterIP 10.43.143.40 <none> 3000/TCP 418d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 418d
NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-5689bb9b5c 1 1 1 91m
replicaset.apps/grafana-59578dfdbd 0 0 0 2d19h
replicaset.apps/grafana-5b7cdd696d 0 0 0 2d21h
replicaset.apps/grafana-5cf74df48b 0 0 0 2d19h
replicaset.apps/grafana-5dbb58c767 0 0 0 2d19h
replicaset.apps/grafana-6546ffbb6b 0 0 0 2d21h
replicaset.apps/grafana-67977b5965 0 0 0 4h9m
replicaset.apps/grafana-7944d879c9 0 0 0 2d19h
replicaset.apps/grafana-7d68c77b4c 0 0 0 2d19h
replicaset.apps/grafana-95bcf4444 0 0 0 2d19h
replicaset.apps/grafana-99c7957b8 0 0 0 2d19h
root@senst-sv-k3s01 ~# kubectl get -n oncall all
NAME READY STATUS RESTARTS AGE
pod/oncall-6c84c58bc4-8bszx 5/5 Running 155 (46m ago) 6d18h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/oncall ClusterIP 10.43.130.73 <none> 3306/TCP,5672/TCP,8080/TCP 378d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/oncall 1/1 1 1 378d
NAME DESIRED CURRENT READY AGE
replicaset.apps/oncall-54cd95b4dc 0 0 0 6d20h
replicaset.apps/oncall-587969d66f 0 0 0 9d
replicaset.apps/oncall-59dd5884df 0 0 0 75d
replicaset.apps/oncall-6476568b55 0 0 0 35d
replicaset.apps/oncall-6b5c4c87fb 0 0 0 6d19h
replicaset.apps/oncall-6c84c58bc4 1 1 1 6d18h
replicaset.apps/oncall-6d4ff5946c 0 0 0 34d
replicaset.apps/oncall-796fccc755 0 0 0 38d
replicaset.apps/oncall-79f6d85476 0 0 0 75d
replicaset.apps/oncall-7c486b8d59 0 0 0 75d
replicaset.apps/oncall-844fc69fc7 0 0 0 75d
The OnCall deployment bundles RabbitMQ, MySQL and Redis while Grafana only has Postgres bundled.
A Helm chart is not ued - my former collegue wrote those by hand (and it shows...) so we update the versions manually by changing the version tag on the images. The cluster is built on three nodes.
I hope I provided all the information that there is - I looked around more but couldn't find anything there. Sorry if I oversaw something or forgot to add - I'm trying my est to work with this situation I am in. :)
The text was updated successfully, but these errors were encountered:
What went wrong?
What happened:
We use Grafana's provisioning to provision alerts from a private Git repository. Recently, since around 1.8.x, my collegues have reported from time to time, that alerts just completely re-appear and are now duplicates. Taking a look into the database itself, it seems that the alert group is completely re-created but matches the previous alerts.
Side-note: I "inherited" this setup from a former collegue who had less than five days to attempt to teach me how to manage and maintain this - and aside from me, nobody here really knows how to work with this kind of software (including MySQL and such), especially since it is all deployed in a Kubernetes (k3s) cluster. So, tl;dr: I have to somehow administer this all on my own. With no prior experience.
This is a screenshot of what we see:
And this is what I see in the database:
mysql-output.txt
Here is the related provisioning snippet:
As you can see in the output, the same alert group gets recreated several times.
What did you expect to happen:
We expected that when adding a note to the alert group and choosing either acknowledge or Resolve, that the group would keep this state untill a new alert of the matching criteria is determined.
How do we reproduce it?
I am unfortunately not aware how this is reproducible - all I do know is that this issue started to happen after the first time I had upgraded OnCall to 1.8.x. Before each upgrade, I do read all the changelogs prior but found nothing - neither here nor in Grafanas' - that would indicate to me something that would have to be changed. Apologies for that. Though due to my visual impairment, I would not be surprised if I overlooked something - it happens sometimes...
Grafana OnCall Version
1.9.20
Product Area
Alert Flow & Configuration
Grafana OnCall Platform?
Kubernetes
User's Browser?
Happens in Chrome, Firefox and Edge and Brave.
Anything else to add?
We deploy Grafana and OnCall in separate deployments:
The OnCall deployment bundles RabbitMQ, MySQL and Redis while Grafana only has Postgres bundled.
A Helm chart is not ued - my former collegue wrote those by hand (and it shows...) so we update the versions manually by changing the version tag on the images. The cluster is built on three nodes.
I hope I provided all the information that there is - I looked around more but couldn't find anything there. Sorry if I oversaw something or forgot to add - I'm trying my est to work with this situation I am in. :)
The text was updated successfully, but these errors were encountered: