Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Replace a VM in staging" fails on "delete NIC" step #5668

Open
sandbergja opened this issue Dec 24, 2024 · 1 comment
Open

"Replace a VM in staging" fails on "delete NIC" step #5668

sandbergja opened this issue Dec 24, 2024 · 1 comment
Assignees
Labels

Comments

@sandbergja
Copy link
Member

Expected behavior

The "Replace a VM in staging" playbook works from tower

Actual behavior

In several cases, it fails with this error:

"pyVmomi.VmomiSupport.vim.fault.GenericVmConfigFault: (vim.fault.GenericVmConfigFault) {
   dynamicType = <unset>,
   dynamicProperty = (vmodl.DynamicProperty) [],
   msg = 'The guest operating system did not respond to a hot-remove request for device ethernet0 in a timely manner.',
   faultCause = <unset>,
   faultMessage = (vmodl.LocalizableMessage) [
      (vmodl.LocalizableMessage) {
         dynamicType = <unset>,
         dynamicProperty = (vmodl.DynamicProperty) [],
         key = 'msg.vigor.hotRemoveStillExists',
         arg = (vmodl.KeyAnyValue) [
            (vmodl.KeyAnyValue) {
               dynamicType = <unset>,
               dynamicProperty = (vmodl.DynamicProperty) [],
               key = '1',
               value = 'ethernet0'
            }
         ],
         message = 'The guest operating system did not respond to a hot-remove request for device ethernet0 in a timely manner.'
      }
   ],
   reason = 'The guest operating system did not respond to a hot-remove request for device ethernet0 in a timely manner.'
}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File \"<stdin>\", line 107, in <module>
  File \"<stdin>\", line 99, in _ansiballz_main
  File \"<stdin>\", line 47, in invoke_module
  File \"/usr/lib64/python3.9/runpy.py\", line 225, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code
    exec(code, run_globals)
  File \"/tmp/ansible_community.vmware.vmware_guest_network_payload_2swyahpq/ansible_community.vmware.vmware_guest_network_payload.zip/ansible_collections/community/vmware/plugins/modules/vmware_guest_network.py\", line 829, in <module>
  File \"/tmp/ansible_community.vmware.vmware_guest_network_payload_2swyahpq/ansible_community.vmware.vmware_guest_network_payload.zip/ansible_collections/community/vmware/plugins/modules/vmware_guest_network.py\", line 823, in main
  File \"/tmp/ansible_community.vmware.vmware_guest_network_payload_2swyahpq/ansible_community.vmware.vmware_guest_network_payload.zip/ansible_collections/community/vmware/plugins/modules/vmware_guest_network.py\", line 610, in _nic_absent
  File \"/tmp/ansible_community.vmware.vmware_guest_network_payload_2swyahpq/ansible_community.vmware.vmware_guest_network_payload.zip/ansible_collections/community/vmware/plugins/module_utils/vmware.py\", line 158, in wait_for_task
  File \"<string>\", line 3, in raise_from
ansible_collections.community.vmware.plugins.module_utils.vmware.TaskError: ('The guest operating system did not respond to a hot-remove request for device ethernet0 in a timely manner.', None)
",

After this happens, when we look in Vsphere, we can see that:

  • The new VM did not get an IP address
  • The new VM did not get the MAC address from the old vm (which is in the ToBeDeleted folder)

Steps to replicate

  1. Go to tower
  2. Run the Replace a VM in staging template on a staging vm
  3. If it succeeds, run it again on another vm until it fails.

Impact of this bug

Devs can't reliably replace VMs without asking ops to fix it in vsphere

Instances of this failure

@acozine
Copy link
Contributor

acozine commented Jan 2, 2025

Looking at the failed runs, I see they all use the Jammy template. However, we've had successful runs with the Jammy template in between, so it isn't a consistent failure.

All the failures have happened since we merged #5547. That PR streamlined the playbook - before that change, we brought the new VM up, then powered it down to replace the MAC address; since that change, we create the new VM in a powered-off state and try to replace the MAC address immediately. This may be causing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants