-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHON-5044 - Fix successive AsyncMongoClients on a single loop always ti… #2065
Conversation
…meout on server selection
The test failures are due to unittest's handling of async, so these changes will have to be included as part of the migration to pytest. |
pymongo/asynchronous/mongo_client.py
Outdated
@@ -1565,6 +1565,8 @@ async def close(self) -> None: | |||
# TODO: PYTHON-1921 Encrypted MongoClients cannot be re-opened. | |||
await self._encrypter.close() | |||
self._closed = True | |||
# Yield to the asyncio event loop so all executor tasks properly exit after cancellation | |||
await asyncio.sleep(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yielding here doesn't guarantee anything about task cleanup. Shouldn't we actually await the tasks to ensure they are cleaned up properly?
pymongo/periodic_executor.py
Outdated
@@ -75,6 +75,8 @@ def close(self, dummy: Any = None) -> None: | |||
callback; see monitor.py. | |||
""" | |||
self._stopped = True | |||
if self._task: | |||
self._task.cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how this relates to the issue described in DRIVERS-3076. Like will calling cancel here change the user visible events a Monitor emits on close()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a task is cancelled, it should stop executing on the next iteration of the event loop. Since I believe the CancelledError
is thrown from the next await
call inside the cancelled task, it's possible that the Monitor emits events differently between cancellations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, if we use this approach then we won't emit the expected ServerHeartbeatFailedEvent on cancellation. Do we need this change in this PR anymore? Can we defer it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Now that we aren't awaiting background tasks on close in this PR, this change is unneeded.
How does this cause a 20 second delay? Which task is blocking the loop? |
Ah we identified the root cause now. The loop blocks because after the wait() raises CancelledError, the loop.sock_recv_into task remains running, then we call So this fix does work because it ensures we cancel the loop.sock_recv_into task before updating the socket timeout. |
The failures are due to our spec runner calling |
pymongo/asynchronous/monitor.py
Outdated
@@ -191,6 +191,8 @@ def gc_safe_close(self) -> None: | |||
|
|||
async def close(self) -> None: | |||
self.gc_safe_close() | |||
if not _IS_SYNC: | |||
await self._executor.join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can call join here because there are cases where close() gets called by the monitor thread/task itself. Joining on yourself will cause the thread/task to hang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we remove these changes an open a new issue to track improving the cleanup behavior? Then this PR can be focused on just the network_layer changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not invest time in fixing the cleanup behavior for a test suite we're already working on refactoring. If we're fine with the tests throwing some warnings during the conversion to pytest I'd prefer to just let them throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment. Don't we eventually need client.close() to await all the background tasks? That's what I think we need a new ticket to track. This isn't really a test suite issue, it's something end users will encounter when closing clients too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more referring to the work required to get the correct cleanup behavior functioning in our existing test suite. We already have quite a few workarounds for the async test suite to work within the current structure. I expect fixing this issue will only add onto that burden at the same time we're also refactoring the suite entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the core issue of AsyncMongoClient.close()
not awaiting all its background tasks needs to be addressed. I'm worried that in doing so with our current test suite, we'll be doing significant additional work that will be thrown out as part of the pytest refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Yes I agree with you on the unittest specific changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a separate ticket for the AsyncMongoClient.close()
changes, but we'll need to decide what to do if the unittest suite doesn't work with them.
pymongo/asynchronous/monitor.py
Outdated
@@ -191,6 +191,8 @@ def gc_safe_close(self) -> None: | |||
|
|||
async def close(self) -> None: | |||
self.gc_safe_close() | |||
if not _IS_SYNC: | |||
await self._executor.join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we remove these changes an open a new issue to track improving the cleanup behavior? Then this PR can be focused on just the network_layer changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, once the tests pass.
…meout on server selection