Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trie: reduce allocations in stacktrie #30743
trie: reduce allocations in stacktrie #30743
Changes from 10 commits
566534f
c7f6aec
5b6a6e4
4eb7032
0323853
22f86a7
5880b2b
649a329
bfc3aae
140be4e
90b0f45
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that there is an opportunity for reducing the amount of allocations even further:
Advantages and issues of that approach:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stacktrie is already nearly alloc-free. I think you can hit it with a billion keys and not have any allocations after the first 100K elements or so. I might be wrong, but I don't think there's any room for meaningful further improvement (by which I mean: other than changing a constant alloc-count to a different constant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you estimate the performance and allocation differences between using this encoder and the standard full node encoder?
The primary distinction seems to be that this encoder inlines the children encoding directly, rather than invoking the children encoder recursively. I’m curious about the performance implications of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so if I change it back (adding back rawNode, and then switching back the
case branchNode:
so that it looks like it did earlier, then the difference is:When we build the children struct, all the values are copied, as opposed if we just use the encoder-type which just uses the same child without triggering a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to not use
sync.Pool
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! You'd be surprised, but the whole problem I had with extra alloc :
I solved that by not using a
sync.Pool
. I suspect it's due to the interface-conversion, but I don't know any deeper details about the why of it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a try to use the pool for slice pointer,
e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it doesn't work
bench1: channel
bench2: slice pointer pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue for using native sync pool is:
The key difference is: in channel approach, the slice is passed as a reference, without descriptor construction involved; in the sync pool approach, the slice is passed as a value, descriptor must be re-created for every Get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between HashedNode and Embedded tiny node?
For embedded node, we also allocate the buffer from the bPool and the buffer is owned by the node itself right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the thing is, that there's only one place in the stacktrie where we convert a node into a
hashedNode
.This is the only place. So we know that once something has been turned into a hashedNode, the
val
is never externally held and thus it can be returned to the pool. The big problem we had earlier, is that we need to overwrite theval
with a hash, but we are not allowed to mutateval
. So this was a big cause of runaway allocs.But now we reclaim those values-which-are-hashes since we know that they're "ours".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your question: yes. So from the perspective of slice-reuse, there is zero difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, in my follow-up PR to reduce allocs in derivesha:#30747 , I remove this trick again.
In that PR, I always copy the value as it enters the stacktrie. So we always own the
val
slice, and are free to reuse via pooling.Doing so is less hacky, we get rid of this special case "val is externally held unless it's an hashedNode because then we own it".
That makes it possible for the derivesha-method to reuse the input-buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if it's created e.g. on L400, it should be returned to the pool. I'm not worried that it will leak, since there is a garbage collector, but if feels like we could avoid extra allocations by maybe ensuring that the passed values are always allocated with the bytes pool ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note line
392
. That is the only place we ever set the type to be ahashNode
. Only onhashNode
types, do we "own" theval
. And if the type was alreadyhashedNode
, it will exit early (at line 334).For
hashNode
s, the return-to-pool happens during reset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean leafnode on line 382?nvm I see what you mean.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point I'm making only stands if you're also using the pool in between calls to Update. And it seems that this is already what you're saying here: #30743 (comment)