-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHON-4264 Async PyMongo Beta #1629
Conversation
@NoahStapp we've picked up conflicts from the delayed import PR |
Resynced and fixed. |
pymongo/__init__.py
Outdated
from pymongo.operations import ( | ||
from pymongo.cursor_shared import CursorType | ||
|
||
# isort: off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the isort disable? Does it cause import cycles? Could you add a comment explaining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it causes import cycles, will add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I fixed that issue with the refactoring, so we can remove this entirely.
pymongo/__init__.py
Outdated
from pymongo.cursor import CursorType | ||
from pymongo.mongo_client import MongoClient | ||
from pymongo.operations import ( | ||
from pymongo.cursor_shared import CursorType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CursorType should still be importable from pymongo.cursor
, is that tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
from pymongo.synchronous.collection import ReturnDocument | ||
from pymongo.synchronous.common import MAX_SUPPORTED_WIRE_VERSION, MIN_SUPPORTED_WIRE_VERSION | ||
from pymongo.synchronous.mongo_client import MongoClient | ||
from pymongo.synchronous.operations import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have multiple operations classes? IMO there should only be one set of InsertOne, etc... operations which can be passed to either sync or async apis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a consequence of the mixing of data and IO operations in much of our codebase. I completely agree we should only have a single set of these operations classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could decide to do that refactoring and further separation now, but that would take a significant amount of time and delay this beta release. We'd also continue to have the constant merge conflicts and rebasing issues on this PR as more commits are merged to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, PYTHON-4476 is tracking this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay as long as PYTHON-4476 is implemented before we release anything non-alpha (betas aren't intended to have breaking changes either).
@@ -26,7 +26,7 @@ | |||
|
|||
from pymongo import MongoClient | |||
from pymongo.errors import OperationFailure | |||
from pymongo.uri_parser import parse_uri | |||
from pymongo.synchronous.uri_parser import parse_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think these unneeded test import changes should be reverted, even if only to make this change easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue they're necessary, as we'll need these changes once we rewrite our test suite to be async and generate the synchronous tests with unasync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this, the work is already done.
@@ -192,17 +192,15 @@ def test_create(self): | |||
lambda: "create_test_no_wc" not in db.list_collection_names(), | |||
"drop create_test_no_wc collection", | |||
) | |||
Collection(db, name="create_test_no_wc", create=True) | |||
db.create_collection("create_test_no_wc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we remove support for Collection(db, name="create_test_no_wc", create=True)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we decided to remove that option entirely from the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this can't be merged in the 4.x branch since it's a major breaking change. So we either need to bring it back or change the plan on when we can release this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we make the full release of the async API the 5.0 release? Then this and any other breaking changes we decide to do are safely wrapped up in a major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds simpler to add this option back in for the sync api. Isn't that what _IS_SYNC is for?:
def __init__(...):
...
if _IS_SYNC and <should auto create>:
await self._create()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a list of other known breaking changes or is this the only one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of having two arguments (create
and session
) for AsyncCollection
that aren't valid. Doing so requires either keyword arguments everywhere we construct an AsyncCollection
to keep the existing argument order or just passing create=False
everywhere.
My recollection of the original reasoning behind this change is that users shouldn't be using create=True
in the first place. If that's accurate, I'd much rather remove it and have a breaking change than continue to support something we don't think should be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall any other breaking changes. I would expect our tests to catch all breaking changes for the synchronous API. Is that a fair assumption?
pyproject.toml
Outdated
@@ -1,5 +1,5 @@ | |||
[build-system] | |||
requires = ["setuptools>=63.0"] | |||
requires = ["setuptools>=63.0", "unasync"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need unasync to build a wheel, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're using pre-commit for the synchro script, yup! Good catch.
pymongo/asynchronous/database.py
Outdated
read_concern, | ||
**kwargs, | ||
) | ||
await coll._create(kwargs, s, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs should only be passed once here.
@@ -192,17 +192,15 @@ def test_create(self): | |||
lambda: "create_test_no_wc" not in db.list_collection_names(), | |||
"drop create_test_no_wc collection", | |||
) | |||
Collection(db, name="create_test_no_wc", create=True) | |||
db.create_collection("create_test_no_wc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds simpler to add this option back in for the sync api. Isn't that what _IS_SYNC is for?:
def __init__(...):
...
if _IS_SYNC and <should auto create>:
await self._create()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few test regressions:
[2024/06/05 16:22:30.833] =================================== FAILURES ===================================
[2024/06/05 16:22:30.833] _____________________ TestCollation.test_create_collection _____________________
[2024/06/05 16:22:30.833] self = <test.test_collation.TestCollation testMethod=test_create_collection>
[2024/06/05 16:22:30.833] def test_create_collection(self):
[2024/06/05 16:22:30.833] self.db.test.drop()
[2024/06/05 16:22:30.833] > self.db.create_collection("test", collation=self.collation)
[2024/06/05 16:22:30.833] test/test_collation.py:128:
[2024/06/05 16:22:30.833] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024/06/05 16:22:30.833] pymongo/_csot.py:120: in csot_wrapper
[2024/06/05 16:22:30.833] return func(self, *args, **kwargs)
[2024/06/05 16:22:30.833] pymongo/synchronous/database.py:606: in create_collection
[2024/06/05 16:22:30.833] coll._create(kwargs, s)
[2024/06/05 16:22:30.833] pymongo/synchronous/collection.py:664: in _create
[2024/06/05 16:22:30.833] self._create_helper(self._name, options, collation, session)
[2024/06/05 16:22:30.833] pymongo/synchronous/collection.py:632: in _create_helper
[2024/06/05 16:22:30.833] self._command(
[2024/06/05 16:22:30.833] pymongo/synchronous/collection.py:589: in _command
[2024/06/05 16:22:30.833] return conn.command(
[2024/06/05 16:22:30.833] pymongo/synchronous/helpers.py:290: in inner
[2024/06/05 16:22:30.833] return func(*args, **kwargs)
[2024/06/05 16:22:30.833] pymongo/synchronous/pool.py:992: in command
[2024/06/05 16:22:30.833] return command(
[2024/06/05 16:22:30.833] pymongo/synchronous/network.py:222: in command
[2024/06/05 16:22:30.833] _async_helpers._check_command_response(
[2024/06/05 16:22:30.833] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024/06/05 16:22:30.833] response = {'code': 48, 'codeName': 'NamespaceExists', 'errmsg': 'Collection pymongo_test.test already exists.', 'ok': 0.0}
[2024/06/05 16:22:30.833] max_wire_version = 17, allowable_errors = None, parse_write_concern_error = True
[2024/06/05 16:22:30.833] def _check_command_response(
[2024/06/05 16:22:30.833] response: _DocumentOut,
[2024/06/05 16:22:30.833] max_wire_version: Optional[int],
[2024/06/05 16:22:30.833] allowable_errors: Optional[Container[Union[int, str]]] = None,
[2024/06/05 16:22:30.833] parse_write_concern_error: bool = False,
[2024/06/05 16:22:30.833] ) -> None:
[2024/06/05 16:22:30.833] """Check the response to a command for errors."""
[2024/06/05 16:22:30.833] if "ok" not in response:
[2024/06/05 16:22:30.833] # Server didn't recognize our message as a command.
[2024/06/05 16:22:30.833] raise OperationFailure(
[2024/06/05 16:22:30.833] response.get("$err"), # type: ignore[arg-type]
[2024/06/05 16:22:30.833] response.get("code"),
[2024/06/05 16:22:30.833] response,
[2024/06/05 16:22:30.833] max_wire_version,
[2024/06/05 16:22:30.833] )
[2024/06/05 16:22:30.833] if parse_write_concern_error and "writeConcernError" in response:
[2024/06/05 16:22:30.833] _error = response["writeConcernError"]
[2024/06/05 16:22:30.833] _labels = response.get("errorLabels")
[2024/06/05 16:22:30.833] if _labels:
[2024/06/05 16:22:30.833] _error.update({"errorLabels": _labels})
[2024/06/05 16:22:30.833] _raise_write_concern_error(_error)
[2024/06/05 16:22:30.833] if response["ok"]:
[2024/06/05 16:22:30.833] return
[2024/06/05 16:22:30.833] details = response
[2024/06/05 16:22:30.833] # Mongos returns the error details in a 'raw' object
[2024/06/05 16:22:30.833] # for some errors.
[2024/06/05 16:22:30.834] if "raw" in response:
[2024/06/05 16:22:30.834] for shard in response["raw"].values():
[2024/06/05 16:22:30.834] # Grab the first non-empty raw error from a shard.
[2024/06/05 16:22:30.834] if shard.get("errmsg") and not shard.get("ok"):
[2024/06/05 16:22:30.834] details = shard
[2024/06/05 16:22:30.834] break
[2024/06/05 16:22:30.834] errmsg = details["errmsg"]
[2024/06/05 16:22:30.834] code = details.get("code")
[2024/06/05 16:22:30.834] # For allowable errors, only check for error messages when the code is not
[2024/06/05 16:22:30.834] # included.
[2024/06/05 16:22:30.834] if allowable_errors:
[2024/06/05 16:22:30.834] if code is not None:
[2024/06/05 16:22:30.834] if code in allowable_errors:
[2024/06/05 16:22:30.834] return
[2024/06/05 16:22:30.834] elif errmsg in allowable_errors:
[2024/06/05 16:22:30.834] return
[2024/06/05 16:22:30.834] # Server is "not primary" or "recovering"
[2024/06/05 16:22:30.834] if code is not None:
[2024/06/05 16:22:30.834] if code in _NOT_PRIMARY_CODES:
[2024/06/05 16:22:30.834] raise NotPrimaryError(errmsg, response)
[2024/06/05 16:22:30.834] elif HelloCompat.LEGACY_ERROR in errmsg or "node is recovering" in errmsg:
[2024/06/05 16:22:30.834] raise NotPrimaryError(errmsg, response)
[2024/06/05 16:22:30.834] # Other errors
[2024/06/05 16:22:30.834] # findAndModify with upsert can raise duplicate key error
[2024/06/05 16:22:30.834] if code in (11000, 11001, 12582):
[2024/06/05 16:22:30.834] raise DuplicateKeyError(errmsg, code, response, max_wire_version)
[2024/06/05 16:22:30.834] elif code == 50:
[2024/06/05 16:22:30.834] raise ExecutionTimeout(errmsg, code, response, max_wire_version)
[2024/06/05 16:22:30.834] elif code == 43:
[2024/06/05 16:22:30.834] raise CursorNotFound(errmsg, code, response, max_wire_version)
[2024/06/05 16:22:30.834] > raise OperationFailure(errmsg, code, response, max_wire_version)
[2024/06/05 16:22:30.834] E pymongo.errors.OperationFailure: Collection pymongo_test.test already exists., full error: {'ok': 0.0, 'errmsg': 'Collection pymongo_test.test already exists.', 'code': 48, 'codeName': 'NamespaceExists'}
[2024/06/05 16:22:30.834] pymongo/synchronous/helpers.py:196: OperationFailure
[2024/06/05 16:22:30.834] __________ TestCollection.test_bypass_document_validation_bulk_write ___________
[2024/06/05 16:22:30.834] self = <test.test_collection.TestCollection testMethod=test_bypass_document_validation_bulk_write>
...
Broke it with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining work to be done before beta release (in no particular order):
no module named pymongo.asynchronous/synchronous
errors in release scriptsslotscheck
ERROR: Module 'grid_file' not found
errorsynchro.py
running as part of pre-commit hookSince there are over 30k lines changed, reviewing the changes line-by-line is not feasible. I've added several comments to files in this PR to explain choices or the important parts to look over closely, but we'll have to rely on our test suite to do the bulk of ensuring correctness.
Additionally, as shown by the list above, there is still remaining work to be done before the beta is fully ready for release. I've elected to mark this PR as ready for review now to give people plenty of time now that the tests are fully passing. There aren't any planned fundamental changes remaining, only fixing the last few parts of our infrastructure and related tooling.
I plan to start writing up the design document this week, which should help answer any additional questions raised as part of the review process.
For reviewers
The most important code to look at closely is
pymongo/network.py
,pymongo/asynchronous/mongo_client.py
,pymongo/asynchronous/cursor.py
, andpymongo/asynchronous/command_cursor.py
. The first is one of the only low-level components that contains both async and sync code, while the others all useIS_SYNC
for behavior unsupported by magic methods like__init__
. Reading those files will give a good understanding of how the rest of the async code is structured.The goal of the
asynchronous
sub-module is to be at feature parity with Motor, while thesynchronous
sub-module should be at parity with the existing PyMongo. Our test suite should cover almost all of the work of ensuring parity, so my biggest concern is code readability and maintainability.All of our work going forward will have to be asynchronous, adding another layer of complexity and verbosity. Minimizing this additional overhead is a priority whenever possible, especially since it can be unclear when a method does or does not need to be asynchronous.