-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-48288: Do not cache collection summaries by default in queries #1138
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #1138 +/- ##
=======================================
Coverage 89.50% 89.50%
=======================================
Files 366 366
Lines 49007 49015 +8
Branches 5941 5941
=======================================
+ Hits 43863 43872 +9
+ Misses 3729 3728 -1
Partials 1415 1415 ☔ View full report in Codecov by Sentry. |
bc957cd
to
bc14e25
Compare
bc14e25
to
f6e4480
Compare
f6e4480
to
25f4e5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one comment.
self._depth -= 1 | ||
else: | ||
raise AssertionError("Bad caching context management detected.") | ||
return self._collection_records.enable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be _collection_summaries
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... yes it should.
We now enable the collection summary cache separately from the collection record cache. In the Butler.query() context, only the collection record cache is enabled by default. More often than not, the query context manager is used to execute one query and then exited. The collection summary cache was often a pessimization in this case -- fetching tens of thousands of rows when only a few were needed.
Enable the caching context a few places where we do repeated dataset queries over the same collections. This allows us to avoid repeated collection summary lookups.
25f4e5f
to
409b870
Compare
We now enable the collection summary cache separately from the collection record cache. In the Butler.query() context, only the collection record cache is now enabled by default. When Butler.registry.caching_context() is called, both caches are enabled (as before.)
More often than not, the query context manager is used to execute one query and then exited. The collection summary cache was often a pessimization in this case -- fetching tens of thousands of rows when only a few were needed.
Also added explicit caching context calls to a few places in the Butler internals where we are doing repeated dataset queries that will benefit from the cache.
Checklist
doc/changes
configs/old_dimensions