Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grafana-agent doesn't support having multiple units running on the same machine #238

Closed
Vultaire opened this issue Dec 6, 2024 · 3 comments

Comments

@Vultaire
Copy link

Vultaire commented Dec 6, 2024

Bug Description

I recently hit this during a cloud handover review. We had an environment with landscape-server and postgresql, both with grafana-agent subordinates related through the cos-agent relation.

It seems that grafana agent does not support such deployments. I saw that in a 3 machine cluster, one had jobs for landscape-server while the others had jobs for postgresql; neither had jobs for both.

Likely it is due to a single grafana agent instance being run, using the same config file: /etc/grafana-agent.yaml

Unfortunately, I didn't see a clear way of getting this to work where both apps' jobs would be included in grafana-agent.yaml. We needed to fall back to using the nrpe charm and cos-proxy to address the alerts for one of the apps.

To Reproduce

Not providing the bundle since it's for a customer, but it's pretty simple:

  • Deploy a 3 unit postgresql cluster. This was tested with the 14/stable channel, rev 468.
  • Deploy a 3 unit landscape-server cluster, onto the same machines as the above cluster. This was tested with the latest/stable channel, rev 121.
  • Deploy the grafana-agent charm. This was tested with the latest/stable channel, rev 223.
  • Relate the grafana-agent charm to both of the other charms.
  • Observe the rendered /etc/grafana-agent.yaml file on each of the 3 machines. None of the machines will have jobs for both of the apps; it's either one or the other.

Environment

This was tested on Juju 3.4.6 on Azure, although the cloud likely does not matter in this case.

Relevant log output

Just look at the /etc/grafana-agent.yaml file.  You can grep for these patterns:

  "job_name: charmed-postgresql"
  "job_name: landscape-server"

Based on the reproducer on this ticket, only one of those will return values due to the race between the two grafana-agent subordinates running on the same machine.

Additional context

postgresql and rabbitmq-server were intentionally put on the same machines in order to reduce how many Azure VMs were needed for the project. Separating them would require additional VMs and thus likely additional cost. (If this were a MAAS/LXD cloud instead, splitting them apart into separate containers would be the obvious workaround.)

@Vultaire Vultaire changed the title grafana-agent doesn't support having units running on the same machine grafana-agent doesn't support having multiple units running on the same machine Dec 6, 2024
@lucabello lucabello transferred this issue from canonical/grafana-agent-k8s-operator Jan 17, 2025
@lucabello
Copy link
Contributor

You're correct, the grafana-agent charm doesn't support either scenario:

  • one grafana-agent app related to multiple principals on the same vm;
  • multiple grafana-agent apps on the same vm.

To learn more: https://discourse.charmhub.io/t/one-grafana-agent-charm-to-rule-them-all/16014

Are you able to separate postgres and landscape-server to different vms? That would likely solve your issue.

We're not planning to support this since we'll be focusing our efforts to the otel collector charm, that will replace grafana-agent.

We opened an issue in Juju to address this the correct way: juju/juju#18665

@lucabello lucabello closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2025
@sed-i
Copy link
Contributor

sed-i commented Jan 17, 2025

Duplicates #11, #211

@Vultaire
Copy link
Author

@lucabello Just as feedback: no. The environment deliberately had postgres and landscape-server colocated to keep the overall VM count down to reduce cost for a customer.

If this is a limitation that we have to live with for this charm, fine, but whatever replaces this should have this use case in mind - otherwise we'll simply be forced to continue to rely on the nrpe charm and cos-proxy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants