LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

okarasov-sift · 2024-05-27T11:33:12Z

Problem

LogDNA agent is failling with error "ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request"
Reproducing only on GKE cluster node pool with GPU
LogDna agent works in other node pools in the same GKE cluster with the same configuration
The problem can be fixed by excluding logDNA agent logs (/var/log/containers/logdna-agent-*logdna-)
if replace CMD in docker image and run sleep infinity and sh inside container and run ./logdna-agent in /work folder, it works on GPU node as well.

Environment

GCP GKE node pool with GPU card
All logdna version does not work
Resource requests: cpu: 20m, limits: memory: 1G

jakedipity · 2024-06-03T18:07:07Z

Thanks for reporting @okarasov-sift, the error indicates a 400 response from our ingestion API so it's a bit puzzling why this happens on a node pool with attached GPU's and even more puzzling that the suggested workarounds work.

We can extract a bit more information about the bad request by setting the agent to log in debug mode. Can you set following environment variable for the container and share the resulting logs: RUST_LOG=info,mz_http::client=debug. Alternatively you can use the following patch command to modify a running daemonset:

kubectl patch daemonset -n logdna-agent logdna-agent --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"RUST_LOG","value":"info,mz_http::client=debug"}}]'

okarasov-sift · 2024-06-05T00:26:14Z

@jakedipity , thank you for you response.

I found in log the following

2024-06-04T23:56:23.784031Z ERROR logdna_agent::_main: Pod metadata is missing for line (retries=disabled): LazyLineSerializer { annotations: None, app: None, env: None, host: None, labels: None, level: None, meta: None, path: Some("/var/log/containers/logdna-agent-brf2z_logging_logdna-agent-dd1dc4f3ced576b2d2187232d18eac13eff5242264071373a1836c1b13a13bc3.log"), line_buffer: None, file_offset: (1062521, 5630, 5791), reader: Mutex { is_locked: false, has_waiters: false }, retry_events_send: Some(Sender { .. }) }
and debug message
2024-06-04T23:56:24.084091Z DEBUG mz_http::client: failed request: 400 Bad Request {"code":"BadRequest","error":"Missing hostname","status":"error"}

okarasov-sift · 2024-06-05T12:07:11Z

Looks like root cause is /etc/hostname file. Our GKE clusters use GCP GKE use Container-Optimized OS version 105. I checked a node where LogDNA agent works, there is /etc/hostname but it is folder. Because, there was no hostname file at all, but during logdna agent pod startup it create volume from my understanding.
file /etc/hostname /etc/hostname: directory
On node where LogDNA agent does not work, hostname is empty file.
file /etc/hostname /etc/hostname: ASCII text
I have added node hostname to this file and it fixed the logdna agent crash looping issue.

Not clear how to fix it.

jakedipity · 2024-06-05T17:37:33Z

The agent daemonset mounts the hostname from the node to /etc/logdna-hostname as shown here and here. I don't know enough about Container-Optimized OS to say what's expected for the hostname file, but the agent does expect it to be set.

okarasov-sift · 2024-06-05T18:08:02Z

@jakedipity, if /etc/hostname (/etc/logdna-hostname) is absent, logdna agent will take /etc/hostname (inside pod). it's not correct logic but it will not crash looping. The problem is that /etc/hostname file (/etc/logdna-hostname) is present but it's empty. I think logdna should check if /etc/logdna-hostname is empty, take value from /etc/hostname. Now it checks only if file exists or not.

logdna-agent-v2/common/config/src/lib.rs

Line 459 in e58e05f

if path.exists() {

jakedipity · 2024-06-05T20:53:35Z

There's some downstream side effects for the node's /etc/hostname being empty. We use the supplied hostname in kubernetes contexts to map out additional fields including the node. If it falls back to the hostname of the container it muddies the node field since each container might have a different hostname.

We can definitely make the paths check more robust and keep falling back until it finds a non-empty value, but it's worth also understanding why the node's /etc/hostname is empty and what is the appropriate way to get a hostname for a GCP container optimized os.

jakedipity · 2024-06-05T20:56:25Z

Additionally, it may be more appropriate to fetch the node name directly from Kubernetes. We already populate the node name into an environment variable here, but such a change would be breaking.

okarasov-sift · 2024-06-18T15:13:31Z

@jakedipity , I see that you merged commit related to /etc/hostname. Does it help to resolve the issue? When are you going to do official release?

jakedipity · 2024-06-19T03:11:04Z

@okarasov-sift That commit doesn't alter the previous behavior of accepting an empty hostname as valid. This issue is minor and we are currently in the process of beta testing the upcoming 3.10.0 release so this is quite low on our priority list.

You're welcome to make the change yourself. The logic isn't difficult it just requires an adequate test case for the new behavior - the recent commit should make testing the function straightforward.

Additionally I think we still should have a discussion about where is an appropriate source for the hostname (which is used as the cluster name downstream) in container contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

okarasov-sift commented May 27, 2024 •

edited

Loading

jakedipity commented Jun 3, 2024

okarasov-sift commented Jun 5, 2024

okarasov-sift commented Jun 5, 2024

jakedipity commented Jun 5, 2024

okarasov-sift commented Jun 5, 2024 •

edited

Loading

jakedipity commented Jun 5, 2024

jakedipity commented Jun 5, 2024

okarasov-sift commented Jun 18, 2024

jakedipity commented Jun 19, 2024 •

edited

Loading

LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

LogDNA agent is failling with error ERROR logdna_agent::_main: bad request, check configuration: 400 Bad Request #617

Comments

okarasov-sift commented May 27, 2024 • edited Loading

jakedipity commented Jun 3, 2024

okarasov-sift commented Jun 5, 2024

okarasov-sift commented Jun 5, 2024

jakedipity commented Jun 5, 2024

okarasov-sift commented Jun 5, 2024 • edited Loading

jakedipity commented Jun 5, 2024

jakedipity commented Jun 5, 2024

okarasov-sift commented Jun 18, 2024

jakedipity commented Jun 19, 2024 • edited Loading

okarasov-sift commented May 27, 2024 •

edited

Loading

okarasov-sift commented Jun 5, 2024 •

edited

Loading

jakedipity commented Jun 19, 2024 •

edited

Loading