Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup page class attributes accesses with shared lru_cache #1528

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Abyss-W4tcher
Copy link
Contributor

Hi,

While working on new capabilities involving the instantiation of thousands of page objects, I noticed that _intel_vmemmap_start was a severe bottleneck in get_contents call:

# Nice tool to trace execution and spot functions execution time
$ viztracer --log_func_exec "get_contents" -- vol.py -f sample.bin linux.pagecache.DumpFs
  • get_contents time: ~31 ms
    • _intel_vmemmap_start: ~30ms

While we were currently using functools.cached_property, it was only efficient for a page instance, which was not relevant as most of the time we only dump its content before discarding it. To enhance the design, I tied each page object hash to its vmlinux hash, which is verified by functools to hit the cache or not. This prevents overlaps between multiple vmlinux in a same sample, and only applies to the needed attributes.

After this patch, get_contents execution time dropped to ~1ms (on cache hits after first call).

@Abyss-W4tcher Abyss-W4tcher changed the title Speedup page attributes accesses with shared lru_cache Speedup page class attributes accesses with shared lru_cache Jan 4, 2025
Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks ok to me, but I'd like a second opinion from @atcuno or @gcmoreira who understand the linux side of things better. If these values are per layer, then shouldn't the caller cache them? I just weary of over engineering the solution and __eq__ and __hash__ need careful planning when being overridden to make sure unexpected consequences don't crop up. I would think getting the module from the object in the hash would be very slow? If the caching isn't helping, how does using lru_cache improve the situation?

@ikelos ikelos requested review from atcuno and gcmoreira January 5, 2025 21:29
@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 5, 2025

  • This design is a glorified PAGE_ATTRS_GLOBAL_CACHE = {} setup, in which we could have {vmlinux.__hash__(): {"vmemmap_start": XXX, "pageflags_enum": YYY}}. I understand the careful planning, as __hash__ and __eq__ override could be problematic if someone tried to compare page instances by Python objects (and not by my_page.vol.offset, which should be the thing to do).

  • As observed with viztracer, the call to retrieve the vmlinux module was in the microseconds order. Indeed, if we look at get_module_from_volobj_type, it only iterates over self._modules and does simple attributes accesses.

  • Here is the current caching mechanism (pseudo-code ish):

page1 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page1._intel_vmemmap_start
# cache hit thanks to functools.cached_property, instant result
page1._intel_vmemmap_start

page2 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page2._intel_vmemmap_start
# cache hit thanks to functools.cached_property, instant result
page2._intel_vmemmap_start

As you can see, the cache only applies to a page instance, but we know for sure that _intel_vmemmap_start will be identical for any inode attached to a specific vmlinux. So, in the end, with the proposed tweak and functools.lru_cache:

page1 = page(vmlinux=myvmlinux)
# no cache hit, ~30 ms
page1._intel_vmemmap_start
# cache hit thanks to functools.lru_cache, instant result
page1._intel_vmemmap_start

page2 = page(vmlinux=myvmlinux)
# cache hit thanks to functools.lru_cache, instant result <--
page2._intel_vmemmap_start

# Different vmlinux
page3 = page(vmlinux=myvmlinux_2)
# no cache hit, ~30 ms
page3._intel_vmemmap_start

@ikelos
Copy link
Member

ikelos commented Jan 7, 2025

But in your example, page1 == page2, doesn't it? That seems very bad to me. Telling them that we've overridden __eq__ for a slight speed up, but caused equality not to mean what it's supposed to mean is very bad. No other object in the framework explicitly requires that you very the offsets match, because that's what __eq__ is supposed to do!

Surely the caller can cache that more effectively than altering the reality of equality within the framework for that type of object alone, to save some milliseconds? If it isn't going to change, then looking it up once for the kernel should be fine?

It's probably also shouldn't be part of the object at all, the give away being that self is only used within the method to recover the kernel? This is another reason why giving object methods they don't need isn't always useful even if it's convenient (regarding our conversations around module). So the more I look at this, the more I think that method is completely misplaced and needs moving rather than the equality and hashing of the object shuffling around...

@ikelos ikelos self-requested a review January 7, 2025 00:59
Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't let this go in, there's too many things that don't work about it, sorry... 5:S

return isinstance(other, page) and self.__hash__() == other.__hash__()

# Use the vmlinux to identify cache pools.
def __hash__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefining these is bad (see main comments).

@@ -2545,7 +2566,8 @@ def pageflags_enum(self) -> Dict:

return pageflags_enum

@functools.cached_property
@property
@functools.lru_cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't need to be a property at all, it only uses self to find the kernel, if anything this should be a LinuxUtility that takes in a kernel, or live somewhere else, there's no benefit to it living here.

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Jan 7, 2025

I understand, I will think about moving it somewhere else, maybe directly in this PR. It could end up being a good example of what a first idea looks like and how it is finally implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants