-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup page class attributes accesses with shared lru_cache #1528
base: develop
Are you sure you want to change the base?
Speedup page class attributes accesses with shared lru_cache #1528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks ok to me, but I'd like a second opinion from @atcuno or @gcmoreira who understand the linux side of things better. If these values are per layer, then shouldn't the caller cache them? I just weary of over engineering the solution and __eq__
and __hash__
need careful planning when being overridden to make sure unexpected consequences don't crop up. I would think getting the module from the object in the hash would be very slow? If the caching isn't helping, how does using lru_cache
improve the situation?
page1 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page1._intel_vmemmap_start
# cache hit thanks to functools.cached_property, instant result
page1._intel_vmemmap_start
page2 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page2._intel_vmemmap_start
# cache hit thanks to functools.cached_property, instant result
page2._intel_vmemmap_start As you can see, the cache only applies to a page instance, but we know for sure that page1 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page1._intel_vmemmap_start
# cache hit thanks to functools.lru_cache, instant result
page1._intel_vmemmap_start
page2 = page(vmlinux=myvmlinux)
# cache hit thanks to functools.lru_cache, instant result <--
page2._intel_vmemmap_start
# Different vmlinux
page3 = page(vmlinux=myvmlinux_2)
# no cache hit, ~30 ms
page3._intel_vmemmap_start |
But in your example, page1 == page2, doesn't it? That seems very bad to me. Telling them that we've overridden Surely the caller can cache that more effectively than altering the reality of equality within the framework for that type of object alone, to save some milliseconds? If it isn't going to change, then looking it up once for the kernel should be fine? It's probably also shouldn't be part of the object at all, the give away being that self is only used within the method to recover the kernel? This is another reason why giving object methods they don't need isn't always useful even if it's convenient (regarding our conversations around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't let this go in, there's too many things that don't work about it, sorry... 5:S
return isinstance(other, page) and self.__hash__() == other.__hash__() | ||
|
||
# Use the vmlinux to identify cache pools. | ||
def __hash__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefining these is bad (see main comments).
@@ -2545,7 +2566,8 @@ def pageflags_enum(self) -> Dict: | |||
|
|||
return pageflags_enum | |||
|
|||
@functools.cached_property | |||
@property | |||
@functools.lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't need to be a property at all, it only uses self to find the kernel, if anything this should be a LinuxUtility that takes in a kernel, or live somewhere else, there's no benefit to it living here.
I understand, I will think about moving it somewhere else, maybe directly in this PR. It could end up being a good example of what a first idea looks like and how it is finally implemented. |
Hi,
While working on new capabilities involving the instantiation of thousands of
page
objects, I noticed that_intel_vmemmap_start
was a severe bottleneck inget_contents
call:get_contents
time: ~31 ms_intel_vmemmap_start
: ~30msWhile we were currently using
functools.cached_property
, it was only efficient for apage
instance, which was not relevant as most of the time we only dump its content before discarding it. To enhance the design, I tied eachpage
objecthash
to itsvmlinux
hash, which is verified byfunctools
to hit the cache or not. This prevents overlaps between multiple vmlinux in a same sample, and only applies to the needed attributes.After this patch,
get_contents
execution time dropped to ~1ms (on cache hits after first call).