Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

deba10106 · 2023-04-12T07:43:44Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

kind bug

/kind feature

What happened:
I deployed the kubernetes on OpenStack vms with kubespray using external OpenStack cloud_provider option and kubernetes version v1.26.3. I see cinder csi plugins are installed but controller plusing and nodeplugins are in "CrashLoopBackOff" state. When I do kubectl logs , I see

Probing CSI driver for readiness

CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API

I checked the cinder_cloud_config in one of the nodes and things looked fine

[Global]
auth-url="http://some_url:5000"
username="admin"
password="*******************"
region="RegionOne"
tenant-name="admin"
domain-name="Default"

[BlockStorage]
bs-version=v3
ignore-volume-az=True
~

What you expected to happen:
I expect it to connect with the cinder volume API . From horizon, I can create delete volumes without any issue.

How to reproduce it:

Anything else we need to know?:

Environment:

openstack-cloud-controller-manager(or other related binary) version:
OpenStack version: Antelope
Others:

jichenjc · 2023-04-14T04:06:09Z

is some-url a URL or ip address? #1874 has some similar issue
but I didn't get update from original issue opener

can you check whether URL is reachable in your cluster? and switch to IP give a try?

deba10106 · 2023-04-15T08:02:52Z

I checked further. I saw that the cloud controller manager has several errors like the following and pods are in pending state also due to untolerations.

untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true

"Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0415 07:21:54.889696 1 node_controller.go:401] Initializing node km1 with cloud provider E0415 07:21:56.299855 1 controller.go:310] error processing service test/nginx-service (will retry): failed to ensure load balancer: failed to find subnet "7abeaaa0-3819-46e6-901f-902524a35ad2": Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:21:56.299991 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to find subnet \"7abeaaa0-3819-46e6-901f-902524a35ad2\": Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:21:57.709432 1 node_controller.go:215] error syncing 'km1': failed to get provider ID for node km1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:21:57.709537 1 node_controller.go:401] Initializing node kw1 with cloud provider E0415 07:21:57.736390 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:21:59.153763 1 controller.go:838] failed to check if load balancer exists for service test/nginx-service: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:21:59.153943 1 controller.go:777] failed to update load balancer hosts for service test/nginx-service: Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:21:59.154031 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="UpdateLoadBalancerFailed" message="Error updating load balancer with new hosts map[kw1:{} kw2:{}]: Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:21:59.166625 1 node_controller.go:215] error syncing 'kw1': failed to get provider ID for node kw1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:21:59.166763 1 node_controller.go:401] Initializing node kw2 with cloud provider E0415 07:22:00.560132 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:22:01.301136 1 loadbalancer.go:1956] "EnsureLoadBalancer" cluster="cluster.local" service="test/nginx-service" I0415 07:22:01.301281 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" E0415 07:22:02.027060 1 node_controller.go:215] error syncing 'kw2': failed to get provider ID for node kw2 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:02.027126 1 node_controller.go:401] Initializing node km1 with cloud provider E0415 07:22:03.403700 1 controller.go:310] error processing service test/nginx-service (will retry): failed to ensure load balancer: failed to find subnet "7abeaaa0-3819-46e6-901f-902524a35ad2": Successfully re-authenticated, but got error executing request: Authentication failed I0415 07:22:03.403964 1 event.go:294] "Event occurred" object="test/nginx-service" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to find subnet \"7abeaaa0-3819-46e6-901f-902524a35ad2\": Successfully re-authenticated, but got error executing request: Authentication failed" E0415 07:22:03.436242 1 node_controller.go:244] Error getting instance metadata for node addresses: error fetching node by provider ID: ProviderID "" didn't match expected format "openstack:///InstanceID", and error by node name: Successfully re-authenticated, but got error executing request: Authentication failed E0415 07:22:04.862499 1 node_controller.go:215] error syncing 'km1': failed to get provider ID for node km1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:04.862618 1 node_controller.go:401] Initializing node kw1 with cloud provider E0415 07:22:06.314245 1 node_controller.go:215] error syncing 'kw1': failed to get provider ID for node kw1 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing I0415 07:22:06.314439 1 node_controller.go:401] Initializing node kw2 with cloud provider E0415 07:22:07.787597 1 node_controller.go:215] error syncing 'kw2': failed to get provider ID for node kw2 at cloudprovider: failed to get instance ID from cloud provider: Successfully re-authenticated, but got error executing request: Authentication failed, requeuing

I've opened a new issue. Will close this once that is resolved. Because, I think, this is the cause of Cinder CSI issue. Any help will be appreciated. Will close both and update the solution in both places if resolved.

k8s-triage-robot · 2023-07-14T08:06:16Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mdbooth · 2023-07-18T10:29:28Z

Looks like this was a config issue.
/close

k8s-ci-robot · 2023-07-18T10:29:33Z

@mdbooth: Closing this issue.

In response to this:

Looks like this was a config issue.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

deba10106 · 2023-09-03T07:02:00Z

Ok I was able to solve the tolerations issues but I still see the same problem.

root@master1:/etc/kubernetes# kubectl logs pod/csi-cinder-controllerplugin-5f99cdb677-5m7xk -n kube-system Defaulted container "csi-attacher" out of: csi-attacher, csi-provisioner, csi-snapshotter, csi-resizer, liveness-probe, cinder-csi-plugin I0903 06:56:45.260682 1 main.go:99] Version: v3.3.0 I0903 06:56:45.264819 1 common.go:111] Probing CSI driver for readiness E0903 06:56:49.363668 1 main.go:143] CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API

From each of the kubernetes nodes, I can get response from openStack volume list and openStack volume service list.

Any suggestion how to debug?

sqaisar · 2024-04-28T22:09:22Z

NFO] 192.168.145.12:44558 - 55791 "A IN <my openstack url>.kube-system.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000231634s
[INFO] 192.168.145.12:47886 - 48222 "AAAA IN <my openstack url>.kube-system.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000184359s
[INFO] 192.168.145.12:44347 - 13713 "AAAA IN <my openstack url>.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000144048s
[INFO] 192.168.145.12:43551 - 9405 "A IN openstack.im.pype.tech.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd 151 0.000242885s
[INFO] 192.168.145.12:45532 - 12467 "A IN <my openstack url>.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000225486s
[INFO] 192.168.145.12:46515 - 14766 "A IN <my openstack url>.openstacklocal. udp 55 false 512" NXDOMAIN qr,rd,ra 130 0.001658468s

These log entries are from coredns pods after enabling the query logs.
But the cloud-config that I've provided have the correct URL. I even tried to use IP rather than using dns for openstack.

I'm not sure why it appends these local svc urls etc

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2023

k8s-ci-robot closed this as completed Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

deba10106 commented Apr 12, 2023

jichenjc commented Apr 14, 2023

deba10106 commented Apr 15, 2023

k8s-triage-robot commented Jul 14, 2023

mdbooth commented Jul 18, 2023

k8s-ci-robot commented Jul 18, 2023

deba10106 commented Sep 3, 2023

sqaisar commented Apr 28, 2024

Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

Probing CSI driver for readiness failed CSI driver probe failed: rpc error: code = FailedPrecondition desc = Failed to communicate with OpenStack BlockStorage API #2192

Comments

deba10106 commented Apr 12, 2023

jichenjc commented Apr 14, 2023

deba10106 commented Apr 15, 2023

k8s-triage-robot commented Jul 14, 2023

mdbooth commented Jul 18, 2023

k8s-ci-robot commented Jul 18, 2023

deba10106 commented Sep 3, 2023

sqaisar commented Apr 28, 2024