-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link MiqWorker record to a running pod when not created using run_single_worker.rb #23112
Link MiqWorker record to a running pod when not created using run_single_worker.rb #23112
Conversation
9f162cb
to
7879964
Compare
23332aa
to
d4173ca
Compare
I added some debug logging showing when we find a worker without a system_uid and a pod that matches the worker class:
|
d4173ca
to
dc13bc3
Compare
@agrare are these legit test errors? Is they related to the ansible runner specs that Jason added recently? |
dc13bc3
to
3e6c113
Compare
@jrafanie the test failures look unrelated, I'll rebase and kick the tests |
TODO exclude starting workers from cleanup orphaned worker rows and confirm that miq_workers are marked failed after the starting timeout on podified. Handle the case where a miq_worker record is created but the pod hasn't started yet. |
@agrare and I discussed over video and I am concerned there is a race condition between actually starting the worker and |
712bbed
to
b9996fb
Compare
b9996fb
to
99fc10a
Compare
Okay I've done a live test where I manually skip any opentofu-runner pods to force the case where the record has been created but the pod is not created yet. Confirmed that we are not deleting the worker record during this period, pending the check on deleting the worker record after the 10 minute startup time. |
Okay confirmed that the "worker starting but no pod is ever created" case behaves correctly: |
99fc10a
to
2bfa69f
Compare
2bfa69f
to
d7a2b57
Compare
it "marks the worker as not responding" do | ||
# Make sure that #find_worker returns our instance of worker that | ||
# that stubs the #stop_container method. | ||
expect(server.worker_manager).to receive(:find_worker).with(worker).and_return(worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE it looks a little weird to have .with(worker).and_return(worker)
here but that is just the way the matcher works, without this line the worker object passed in to stop_worker
does not have the mock expect(worker).to receive(:stop_container)
applied and it actually drops into ContainerOrchestrator to try to delete the container. This ensures that stop_worker
has the right object with the stubbed methods.
Okay live test forcing the race complete and specs should be green, taking out of WIP |
Checked commits agrare/manageiq@4f070f7~...d7a2b57 with ruby 3.1.5, rubocop 1.56.3, haml-lint 0.51.0, and yamllint |
Backported to
|
…bernetes_non_rails_system_uid Link MiqWorker record to a running pod when not created using run_single_worker.rb (cherry picked from commit a04c02d)
Depends on:
Fixes ManageIQ/manageiq-providers-embedded_terraform#67