-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-47770: Fix threadsafety of sqlalchemy MetaData access #1132
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #1132 +/- ##
==========================================
- Coverage 89.50% 89.50% -0.01%
==========================================
Files 366 366
Lines 48998 49007 +9
Branches 5943 5941 -2
==========================================
+ Hits 43855 43863 +8
Misses 3729 3729
- Partials 1414 1415 +1 ☔ View full report in Codecov by Sentry. |
e3b4297
to
56f4854
Compare
0180196
to
4736b46
Compare
Wrap all accesses to sqlalchemy.MetaData with a lock to avoid concurrency issues. sqlalchemy.MetaData is documented to be threadsafe for reads, but not with concurrent modifications. We add tables dynamically at runtime, and the MetaData object is shared by all Database instances sharing the same connection pool. Prior to adding the lock, Butler server database calls that added table definitions dynamically were sometimes failing with InvalidRequestError exceptions complaining about inconsistency of table definitions.
Fix an issue where a test for equality of filenames in the system temp directory were failing. On MacOS, /private/tmp and /tmp refer to the same directory.
4736b46
to
4c0370c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but I'm not sure that name mangling is done consistently.
@@ -164,13 +165,9 @@ def addTable(self, name: str, spec: ddl.TableSpec) -> sqlalchemy.schema.Table: | |||
to be declared in any order even in the presence of foreign key | |||
relationships. | |||
""" | |||
name = self._db._mangleTableName(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, do you want DatabaseMetadata
to handle table name mangling internally? I do not think it is done consistently now, you call add_table
with the original name, but get_table
is called with mangled name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just going to remove it entirely. It's a no-op -- the only implementation returns the name unchanged. I think it is a holdover from the Oracle implementation.
The existing implementation was already incorrect because there were several places where name mangling was applied multiple times. I noticed that the two cases I already removed were double-applying the mangling (because it is done internally in _convertTableSpec) so I had removed those, but apparently the logic is also wrong in a lot of other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recollection is that it was PostgreSQL, not Oracle, that needed name shrinking. It may have been index and constraint names more than table names where it was important, but I'm pretty sure PostgreSQL is the one with a 64-char limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two different things. The name shrinking logic is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for the noise.
@@ -1192,7 +1189,7 @@ def getExistingTable(self, name: str, spec: ddl.TableSpec) -> sqlalchemy.schema. | |||
""" | |||
assert self._metadata is not None, "Static tables must be declared before dynamic tables." | |||
name = self._mangleTableName(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this one needs to be dropped too?
@@ -1206,10 +1203,7 @@ def getExistingTable(self, name: str, spec: ddl.TableSpec) -> sqlalchemy.schema. | |||
) | |||
if name in inspector.get_table_names(schema=self.namespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but this won't work if name is not mangled?
This table name mangling logic is a holdover from the Oracle implementation and the current implementation is a no-op. The logic was being inconsistently applied so none of this would have worked even in a new implementation that had a real implementation of the mangling function.
Wrap all accesses to sqlalchemy.MetaData with a lock to avoid concurrency issues.
sqlalchemy.MetaData is documented to be threadsafe for reads, but not with concurrent modifications. We add tables dynamically at runtime, and the MetaData object is shared by all Database instances sharing the same connection pool.
Prior to adding the lock, Butler server database calls that added table definitions dynamically were sometimes failing with InvalidRequestError exceptions complaining about inconsistency of table definitions.
Checklist
doc/changes
configs/old_dimensions