Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos 1.9.0 on VMware does not configure network interfaces (both metal and vmware images) #9985

Open
Tracked by #9825
unya opened this issue Dec 18, 2024 · 7 comments

Comments

@unya
Copy link

unya commented Dec 18, 2024

Bug Report

Description

Reset a previously running talos VM in VMware workstation (previously ran metal.iso v1.8.3),
booted from first vmware.iso then metal.iso, both v1.9.0. Neither could acquire network address from DHCP (standard "NAT" mode in VMware workstation). Additionally both booted very long, taking long time to wait on udev event queue (approximately 50s, visible in the attached logs).\

Once booted, manually configuring network address in F3 menu of the dashboard was similarly ineffective - there's note in dashboard about setting up route, but IP address is not assigned, only gateway.

Logs

serial-output.log
talos-worker1.vmx

Environment

  • Talos version: 1.9.0
  • Platform: metal & vmware, all running on vmware
@unya
Copy link
Author

unya commented Dec 18, 2024

Further investigation with alternative network setups point to network working when I switched from default NAT-ed one to bridged one.

However, long boot time remains.

@lwbt
Copy link

lwbt commented Dec 18, 2024

I have the same message on an SBC cluster:

[   58.436538] [talos] serviceudevd: Health check failed: exit status 1: Failed to initialize SELinux labeling handle: No such file or directory
[   58.452813] Timed out for waiting the udev queue being empty.

Nodes were previously configured with:

overlay:
    image: siderolabs/sbc-raspberrypi
    name: rpi_generic
customization:
    extraKernelArgs:
        - net.ifnames=0

So my assumption after reading the changelog was that it should be fine, but it appears that it wasn't.

Edit: After disconnecting the device entirely from the power source (PoE in this case). It is able to boot up successfully with the usual boot times, may be a few seconds later, but not much if I recall correctly. Both of these messages are still there though.

@smira
Copy link
Member

smira commented Dec 18, 2024

[   53.154492] [talos] service[udevd](Running): Health check failed: exit status 1: Failed to initialize SELinux labeling handle: No such file or directory

we will look into this, not clear so far

[  183.409903] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 5 vectors allocated
[  183.411196] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down

Looks like the only interface you have has link down, so how would it have networking?

On VMWare, it makes more sense to use VMWare platform image, but I don't think it would change anything with this issue.

You can try on VMWare side to use e1000 emulation, VMXNet has other issue with packet encapsulation.

@rgomezceis
Copy link

[   53.154492] [talos] service[udevd](Running): Health check failed: exit status 1: Failed to initialize SELinux labeling handle: No such file or directory

we will look into this, not clear so far

[  183.409903] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 5 vectors allocated
[  183.411196] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down

Looks like the only interface you have has link down, so how would it have networking?

On VMWare, it makes more sense to use VMWare platform image, but I don't think it would change anything with this issue.

You can try on VMWare side to use e1000 emulation, VMXNet has other issue with packet encapsulation.

Same using e1000 driver

@smira
Copy link
Member

smira commented Dec 23, 2024

Same using e1000 driver

I'm not sure what 'same' is.

@rgomezceis
Copy link

Same using e1000 driver

I'm not sure what 'same' is.

I get the same error (udevd timeout) using E1000 driver instead of VMXNET.
#9994

@smira
Copy link
Member

smira commented Dec 26, 2024

That error itself is a red herring. Yes, udevd takes a long time to settle, but this is not fatal. Is there any problem after it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants