-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cinder-csi-plugin] [Bug] Failed to GetOpenStackProvider i/o timeout #1874
Comments
The current info seems too generic
looks like it's not CPO CSI function not working, it's the pod not able to connect to keystone
so I think more info like the real error you saw, the logs of CSI pods etc will be helpful |
I seem to be having a similar problem. So let me hopefully provide enough information to get somewhere with this. Cluster info: Kubernetes 1.24.1, CoreDNS 1.8.6, csi-cinder-plugin 1.22.0 (and I tested with 1.24.2). Cloud config for csi-cinder-plugin.
Logs from the csi-cinder-controllerplugin pod / container : cinder-csi-plugin
When looking at the network traffic from the cinder-csi-plugin, I see only DNS requests for AAAA and A records looking for this dns name.
So I see the same strange thing that was reported above. The container seems to be trying to resolve this address with ".kube-system.svc.cluster.local" appended to the valid auth url. The keystone API is at that url. I don't think this is a networking issue since I can access the API from other pods in the cluster( it's kinda hard to check from the container itself since it doesn't really have any tools and it restarts after about 20 seconds). This configuration was working just fine in Kubernetes 1.22. I upgraded the cluster to 1.23.7 and then 1.24.1 and everything was working fine for about a week. Then for unrelated reasons I needed to restart the VM's in this cluster. After the restart was when I noticed this container wasn't ready along with all of my pods that have Cinder provided pvc were not working. The other containers in the pod just have the following logs look like this with "Still connecting" repeating about every ten minutes.
I was looking at the driver code and I don't see how it could be getting a different url in the driver itself. Could it be something from gophercloud? Tracing things back up the stack it seems like that might be where this is happening. But I am not sure... I also tried setting os-endpoint-type to "internalURL" since about the only thing I could figure is that gophercloud was changing something with the url because of endpoint-type. This seemed to have no effect. I have tried removing the :80 in url. Because why not... Also no effect. I am going to try to downgrade my cluster and hope this start working with kubernetes 1.23.7. |
from context, looks like the URL is taken as short service name and appended local host domain name |
Hello, I have exactly the same issue, did you finally find a solution ? Thanks Jeff |
are you able to connect to the openstack endpoint from your local ? e.g the DNS with the cloud.conf you used? |
yes from the node yes, i don't understand why cinder-csi-pluggin can't..... I0906 13:34:01.026232 1 openstack.go:89] Block storage opts: {0 false false} The openstack-cloud-controller-manager-9kkt have no Issue ... to join same endpoint... |
Original issue is which is incorrect .. I suggest you try use ip instead of hostname of openstack service and try again to see |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
These log entries are from coredns pods after enabling the query logs. I'm not sure why it appends these local svc urls etc |
We had the same issue. In the end, the problem was that the the csi plugin could not reach the coredns pod. Due to containernetworking/cni#878, the early-scheduled pods, such as coredns, were using a different subnet. Removing podman, removing |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
When starting the csi plugin it is not able to communicate with keystone. It will get stuck in an io timeout.
What you expected to happen:
The plugin should talk to the API and start.
How to reproduce it:
I am running a 1.24.0 cluster with Version 1.23.0 and CoreDNS 1.9.2
Anything else we need to know?:
A TCPDump suggests that the pod tries to resolve the wrong URL. It will try to connect to ${URL}.kube-system.svc.cluster.local. The same version of the csi driver works on Kubernetes 1.23 and CoreDNS 1.8.7.
Environment:
The text was updated successfully, but these errors were encountered: