Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

Closed
deba10106 opened this issue Apr 12, 2023 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@deba10106
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

kind bug

/kind feature

What happened:
I deployed the kubernetes on OpenStack vms with kubespray using external OpenStack cloud_provider option and kubernetes version v1.26.3. I see cinder csi plugins are installed but controller plusing and nodeplugins are in "CrashLoopBackOff" state. When I do kubectl logs , I see

Probing CSI driver for readiness

CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API

I checked the cinder_cloud_config in one of the nodes and things looked fine

[Global]
auth-url="http://some_url:5000"
username="admin"
password="*******************"
region="RegionOne"
tenant-name="admin"
domain-name="Default"

[BlockStorage]
bs-version=v3
ignore-volume-az=True
~

What you expected to happen:
I expect it to connect with the cinder volume API . From horizon, I can create delete volumes without any issue.

How to reproduce it:

Anything else we need to know?:

Environment:

  • openstack-cloud-controller-manager(or other related binary) version:
  • OpenStack version: Antelope
  • Others:
@jichenjc
Copy link
Contributor

is some-url a URL or ip address? #1874 has some similar issue
but I didn't get update from original issue opener

can you check whether URL is reachable in your cluster? and switch to IP give a try?

@deba10106
Copy link
Author

I checked further. I saw that the cloud controller manager has several errors like the following and pods are in pending state also due to untolerations.

untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true

"Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0415 07:21:54.889696 1 node_controller.go:401] Initializing node km1 with cloud provider E0415 07:21:56.299855 1 controller.go:310] error processing service test/nginx-service (will retry): failed to ensure load balancer: failed to find subnet "7abeaaa0-3819-46e6-901f-902524a35ad2": Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:21:56.299991 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to find subnet \"7abeaaa0-3819-46e6-901f-902524a35ad2\": Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:21:57.709432 1 node_controller.go:215] error syncing 'km1': failed to get provider ID for node km1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:21:57.709537 1 node_controller.go:401] Initializing node kw1 with cloud provider E0415 07:21:57.736390 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:21:59.153763 1 controller.go:838] failed to check if load balancer exists for service test/nginx-service: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:21:59.153943 1 controller.go:777] failed to update load balancer hosts for service test/nginx-service: Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:21:59.154031 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="UpdateLoadBalancerFailed" message="Error updating load balancer with new hosts map[kw1:{} kw2:{}]: Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:21:59.166625 1 node_controller.go:215] error syncing 'kw1': failed to get provider ID for node kw1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:21:59.166763 1 node_controller.go:401] Initializing node kw2 with cloud provider E0415 07:22:00.560132 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:22:01.301136 1 loadbalancer.go:1956] "EnsureLoadBalancer" cluster="cluster.local" service="test/nginx-service" I0415 07:22:01.301281 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" E0415 07:22:02.027060 1 node_controller.go:215] error syncing 'kw2': failed to get provider ID for node kw2 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:02.027126 1 node_controller.go:401] Initializing node km1 with cloud provider E0415 07:22:03.403700 1 controller.go:310] error processing service test/nginx-service (will retry): failed to ensure load balancer: failed to find subnet "7abeaaa0-3819-46e6-901f-902524a35ad2": Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:22:03.403964 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to find subnet \"7abeaaa0-3819-46e6-901f-902524a35ad2\": Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:22:03.436242 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:22:04.862499 1 node_controller.go:215] error syncing 'km1': failed to get provider ID for node km1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:04.862618 1 node_controller.go:401] Initializing node kw1 with cloud provider E0415 07:22:06.314245 1 node_controller.go:215] error syncing 'kw1': failed to get provider ID for node kw1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:06.314439 1 node_controller.go:401] Initializing node kw2 with cloud provider E0415 07:22:07.787597 1 node_controller.go:215] error syncing 'kw2': failed to get provider ID for node kw2 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing

I've opened a new issue. Will close this once that is resolved. Because, I think, this is the cause of Cinder CSI issue. Any help will be appreciated. Will close both and update the solution in both places if resolved.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2023
@mdbooth
Copy link
Contributor

mdbooth commented Jul 18, 2023

Looks like this was a config issue.
/close

@k8s-ci-robot
Copy link
Contributor

@mdbooth: Closing this issue.

In response to this:

Looks like this was a config issue.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deba10106
Copy link
Author

Ok I was able to solve the tolerations issues but I still see the same problem.

root@master1:/etc/kubernetes# kubectl logs pod/csi-cinder-controllerplugin-5f99cdb677-5m7xk -n kube-system Defaulted container "csi-attacher" out of: csi-attacher, csi-provisioner, csi-snapshotter, csi-resizer, liveness-probe, cinder-csi-plugin I0903 06:56:45.260682 1 main.go:99] Version: v3.3.0 I0903 06:56:45.264819 1 common.go:111] Probing CSI driver for readiness E0903 06:56:49.363668 1 main.go:143] CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API

From each of the kubernetes nodes, I can get response from openStack volume list and openStack volume service list.

Any suggestion how to debug?

@sqaisar
Copy link

sqaisar commented Apr 28, 2024

NFO] 192.168.145.12:44558 - 55791 "A IN <my openstack url>.kube-system.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000231634s
[INFO] 192.168.145.12:47886 - 48222 "AAAA IN <my openstack url>.kube-system.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000184359s
[INFO] 192.168.145.12:44347 - 13713 "AAAA IN <my openstack url>.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000144048s
[INFO] 192.168.145.12:43551 - 9405 "A IN openstack.im.pype.tech.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000242885s
[INFO] 192.168.145.12:45532 - 12467 "A IN <my openstack url>.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000225486s
[INFO] 192.168.145.12:46515 - 14766 "A IN <my openstack url>.openstacklocal. udp 55 false 512" NXDOMAIN qr,rd,ra 130 0.001658468s

These log entries are from coredns pods after enabling the query logs.
But the cloud-config that I've provided have the correct URL. I even tried to use IP rather than using dns for openstack.

I'm not sure why it appends these local svc urls etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants