Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomResourceDefinitions status fields cause spam of errors that cannot be fixed #2482

Open
roeoo opened this issue Aug 23, 2024 · 3 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@roeoo
Copy link

roeoo commented Aug 23, 2024

What happened: Spam of errors that look like this:

"kube_customresource_phase" err="[status,phase]: expected value for path to be string, got <nil>"

What you expected to happen:

There should be no errors logged. Status fields are not guaranteed to exist at resource creation. The behavior is not consistent with known types where a default value is taken.

How to reproduce it (as minimally and precisely as possible):

Create cr-config.yaml:

kind: CustomResourceStateMetrics
spec:
  resources:
  - groupVersionKind:
      group: samplecontroller.k8s.io
      kind: "Foo"
      version: v1alpha1
    labelsFromPath:
      name: [metadata, name]
      namespace: [metadata, namespace]
    metricNamePrefix: "cr"
    metrics:
    - name: replicas
      each:
        type: Gauge
        gauge:
          path: [status, availableReplicas]
          nilIsZero: true
    - name: test
      each:
        type: StateSet
        stateSet:
          labelName: phase
          path: [status, phase]
          list:
            - Pending
            - Provisioning
            - Provisioned
            - Running
            - Deleting
            - Deleted
            - Failed
            - Unknown

Create a CRD with status and a valid object. Do not run a controller (this is one of possible scenarios).

kubectl apply -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/example-foo.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/crd.yaml

Run:

go run main.go --custom-resource-state-only --custom-resource-state-config-file cr-config.yaml --kubeconfig ~/.kube/config

The error repeats for every instance of a resource, and there can be thousands of such resources.

registry_factory.go:685] "cr_test" err="[status,phase]: expected value for path to be string, got <nil>"

Anything else we need to know?:

I believe is a general problem for all CRDs and all status fields. Since there can be many differing objects, the error isn't helpful enough. Might be useful to log this only in verbose mode with resource name and kind.

Environment: kind or any other Kubernetes cluster

  • kube-state-metrics version: commit f7304dc
  • Kubernetes version (use kubectl version): Client Version: v1.30.2 (shouldn't matter)
  • Cloud provider or hardware configuration: kind Server Version: v1.30.0
  • Other info: n/a
@roeoo roeoo added the kind/bug Categorizes issue or PR as related to a bug. label Aug 23, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 23, 2024
roeoo pushed a commit to roeoo/kube-state-metrics that referenced this issue Aug 23, 2024
It's legitimate for new objects to not have status fields. Especially if
controller is unavailable. Such errors are printed for every resource
and can mask actual problems. Instead, log in verbose mode to allow
troubleshooting.

Fixes kubernetes#2482
@roeoo
Copy link
Author

roeoo commented Aug 23, 2024

Actually nilIsZero might be a better solution giving users some control. It would set all states to zero by default similarly to gauges.

@roeoo
Copy link
Author

roeoo commented Aug 28, 2024

After some more thinking, nilIsZero makes sense only when used with a singular gauge where all labels are known upfront. For generated labels, there is no way to create a sensible metric when some labels are missing.

Thus it should be a normal case for series to not exist, not an error.

@dashpole
Copy link
Contributor

dashpole commented Sep 5, 2024

/assign @rexagod
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants