Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange: can't auto resolved by groupKey? #5091

Closed
mrbaowei opened this issue Sep 27, 2024 · 2 comments
Closed

Strange: can't auto resolved by groupKey? #5091

mrbaowei opened this issue Sep 27, 2024 · 2 comments
Labels

Comments

@mrbaowei
Copy link

What went wrong?

What happened:

  • I found that there are many duplicate Alert Groups in the system and can't auto resloved. Upon further investigation, I discovered that they were not grouped correctly. So, I selected one of them for testing.
    payload:
    "receiver": "hermes-heartbeat-loss",
    "status": "firing",
    "alerts": [
        {
            "status": "firing",
            "labels": {
                "alertgroup": "frp_alerts",
                "alertname": "downTimeAlert",
                "frp": "10859",
                "severity": "error",
                "type": "hermes-heartbeat"
            },
            "annotations": {
                "description": "host is down, frp: 10859",
                "summary": "downtime alert"
            },
            "startsAt": "2024-09-27T00:27:00+08:00",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://VM-182-5-ubuntu:9999/vmalert/alert?group_id=7480534740325218680&alert_id=13669664441645264980",
            "fingerprint": "1badb0c76cee4041"
        }
    ],
    "groupLabels": {
        "frp": "10859"
    },
    "commonLabels": {
        "alertgroup": "frp_alerts",
        "alertname": "downTimeAlert",
        "frp": "10859",
        "severity": "error",
        "type": "hermes-heartbeat"
    },
    "commonAnnotations": {
        "description": "host is down, frp: 10859",
        "summary": "downtime alert"
    },
    "externalURL": "http://VM-182-5-ubuntu:9093",
    "version": "4",
    "groupKey": "{}/{type=\"hermes-heartbeat\"}:{frp=\"10859\"}",
    "truncatedAlerts": 0,
    "numFiring": 1,
    "numResolved": 0
}

As a test, I modified the status field inside and changed it to resolved, as shown below.

{
    "receiver": "hermes-heartbeat-loss",
    "status": "resolved",
    "alerts": [
        {
            "status": "resolved",
            "labels": {
                "alertgroup": "frp_alerts",
                "alertname": "downTimeAlert",
                "frp": "10859",
                "severity": "error",
                "type": "hermes-heartbeat"
            },
            "annotations": {
                "description": "host is down, frp: 10859",
                "summary": "downtime alert"
            },
            "startsAt": "2024-09-27T00:27:00+08:00",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://VM-182-5-ubuntu:9999/vmalert/alert?group_id=7480534740325218680&alert_id=13669664441645264980",
            "fingerprint": "1badb0c76cee4041"
        }
    ],
    "groupLabels": {
        "frp": "10859"
    },
    "commonLabels": {
        "alertgroup": "frp_alerts",
        "alertname": "downTimeAlert",
        "frp": "10859",
        "severity": "error",
        "type": "hermes-heartbeat"
    },
    "commonAnnotations": {
        "description": "host is down, frp: 10859",
        "summary": "downtime alert"
    },
    "externalURL": "http://VM-182-5-ubuntu:9093",
    "version": "4",
    "groupKey": "{}/{type=\"hermes-heartbeat\"}:{frp=\"10859\"}",
    "truncatedAlerts": 0,
    "numFiring": 1,
    "numResolved": 0
}

What was unexpected is that after submitting through the same integrations, the original alert was not automatically resolved. Instead, a new alert group was created, which is already marked as resolved.

However, when I submitted with the original payload and the status set to “firing,” it successfully added a new entry to the original alert group. This indicates that the groupKey grouping seems to be working correctly. The issue appears to be that the alert group cannot be automatically resolved.

Did I miss something?

What did you expect to happen:

  • The alert group can be automatically resolved by changing the status to ‘resolved.’

How do we reproduce it?

  1. Open Grafana OnCall and do X
  2. Now click button Y
  3. Wait for the browser to crash. Error message says: "Error..."

Grafana OnCall Version

latest, oss

Product Area

Alert Flow & Configuration

Grafana OnCall Platform?

None

User's Browser?

No response

Anything else to add?

No response

@mrbaowei
Copy link
Author

This issue occurred in our production environment. In the test environment, I triggered the alert using the firing payload and then resolved it using the resolved payload. Everything worked as expected, and the alert was automatically resolved.

This is a very strange scenario because it might be difficult to reproduce. So if need any additional information from me, please feel free to ask at any time.

thanks.

@mrbaowei
Copy link
Author

It’s resolved now. The issue was with my template configuration. I used a route template: if {{payload.status = firing}}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant