Wrong query result from aggregation Operators #703

paulojmdias · 2024-12-13T18:28:56Z

I have the following setup with 3 server groups

promxy
  -> Grafana mimir dc1
      -> dc1
      -> google-us-dc1
      -> google-us-dc2
  -> Grafana mimir dc2
      -> dc2
      -> google-us-dc1
      -> google-us-dc2
  -> Grafana mimir dc3
      -> dc3
      -> google-eu-dc1
      -> google-eu-dc2

When we do the following query count(up{label_key="label_value"}) by (region) we have the following results:

{region="dc1"}                       342
{region="google-us-dc1"}    31
{region="google-us-dc2"}    31
{region="dc2"}                       341
{region="dc3"}                       30
{region="google-eu-dc1"}     25
{region="google-eu-dc2"}     24
{region="google-eu-dc3"}     29
{region="google-eu-dc4"}     36
{region="google-eu-dc5"}     29

If we remove the aggregator and do the query count(up{label_key="label_value"}) I expect to have the value 918, but the truth is promxy are returning the max value from the 3 server groups we have, which is 404 and in this case comes from the sum from the data which resides on Grafana mimir dc1

{region="dc1"}                       342
{region="google-us-dc1"}    31
{region="google-us-dc2"}    31

I also did a test, I added a dedicated label to each server group, named __dc__, and when we do the query count(count(up{stack="persistence"}) without (__dc__)), we have the desired value which is 918.
However, let's go and do the expected query count(up{stack="persistence"}). We will have the value 980 since they are counting the values from google-us-dc1 and google-us-dc2 twice because when we add custom labels per server group, we are saying the data on each server group is unique, which is not the case.

Although we are using Mimir, in the end, is a Prometheus query API that we are using, so I don't feel it is related.

We are not overriding the prefer_max option and we are using the version v0.0.91.

I already tried to debug in Promxy code, but I ran without ideas and I decided to open this issue. I'm open to contribute either way if I find something 🙌

The text was updated successfully, but these errors were encountered:

jacksontj · 2025-01-09T06:27:41Z

Thanks for reaching out, lets jump into it!

I have the following setup with 3 server groups

I believe there may be a typo in this example; as described this configuration has some overlapping DCs (google-us-dc1 is in mimir dc1 and dc2). Given that the example below has eu-dc1..5 -- I'm assuming mimir dc2 was supposed to be eu? (since otherwise i don't see eu dc3,4,5).

but the truth is promxy are returning the max value from the 3 server groups we have,

This sounds like maybe the servergroup configuration isn't quite right -- as the NodeReplacer (that does the max/rewrite) is done at the top-level. All of the servergroup merging is done lower down. So this does sound like an issue with the servergroup configuration rather than the aggregation rewrite in NodeReplacer.

Although we are using Mimir, in the end, is a Prometheus query API that we are using, so I don't feel it is related.

This seems correct; this seems like an issue with the promxy servergroup config not quite matching your setup.

We are not overriding the prefer_max option and we are using the version v0.0.91.

If we are running into prefer_max we are definitely hitting a servergroup configuration issue. The prefer_max is intended to handle merging of data within a servergroup (defined as a set of API endpoints that "have the same data").

I ran without ideas and I decided to open this issue

I'd be happy to give a hand here! Could you provide your promxy config? Or at least the servergroup configuration. As well as re-iterating the downstreams, their data, and desired merging behavior. I think from there we'll be able to make some progress :)

paulojmdias linked a pull request Dec 16, 2024 that will close this issue

fix: wrong query result from aggregation Operators #704

Open

jacksontj added the question label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong query result from aggregation Operators #703

Wrong query result from aggregation Operators #703

paulojmdias commented Dec 13, 2024 •

edited

Loading

jacksontj commented Jan 9, 2025

Wrong query result from aggregation Operators #703

Wrong query result from aggregation Operators #703

Comments

paulojmdias commented Dec 13, 2024 • edited Loading

jacksontj commented Jan 9, 2025

paulojmdias commented Dec 13, 2024 •

edited

Loading