-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify profile stack trace representation #615
base: main
Are you sure you want to change the base?
Conversation
- Introduce a first-class Stack message type and lookup table. - Replace location index range based stack trace encoding on Sample with a single stack_index reference. - Remove the location_indices lookup table. The primary motivation is laying the ground work for [timestamp based profiling][timestamp proposal] where the same stack trace needs to be referenced much more frequently compared to aggregation based on low cardinality attributes. Timestamp based profiling is also expected to be used with the the upcoming [Off-CPU profiling][off-cpu pr] feature in the eBPF profiler. Off-CPU stack traces have a different distribution compared to CPU samples. In particular stack traces are much more repetitive because they only occur at call sites such as syscalls. For the same reason it is also uncommon to see a stack trace are a root-prefix of a previously observed stack trace. We might need to revisit the previous [previous benchmarks][benchmarks] to confirm these claims. The secondary motivation is simplicitly. Arguably the proposed change here will make it easier to write exporters, processors as well as receivers. It seems like we had rough consensus around this change in previous SIG meetings, and it seems like a good incremental step to make progress on the timestamp proposal. [timestamp proposal]: #594 [off-cpu pr]: open-telemetry/opentelemetry-ebpf-profiler#196 [benchmarks]: https://docs.google.com/spreadsheets/d/1Q-6MlegV8xLYdz5WD5iPxQU2tsfodX1-CDV1WeGzyQ0/edit?gid=2069300294#gid=2069300294
// Supersedes location_index. | ||
int32 locations_length = 2; | ||
// Reference to stack in Profile.stack_table. | ||
int32 stack_index = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have an optional int64 instruction_ip property here that would take the common stack at stack_index, but also add the instruction_ip to it. This would allow for common on-CPU callstacks to be respresented cheaply that are in the same function but hit different parts of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be just int64 since that won't play well with symbolization and inlines, would have to be a location ID.
Also, this optimization would only be most useful to profile types like CPU sampling where the leaf frame is indeed very random. For profiles like heap profiling it would not be as valuable since there even the leaf is a callsite, not a sampled IP.
And if we are looking at such "encode the tree more efficiently" optimizations then we should probably look at using the actual tree encoding as discussed a bit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it's only useful for on-CPU sampling with skids, etc. So I don't think it should be required, but definitely useful when needed as an option.
The primary motivation is laying the ground work for timestamp based profiling where the same stack trace needs to be referenced much more frequently compared to aggregation based on low
cardinality attributes.
Timestamp based profiling is also expected to be used with the upcoming Off-CPU profiling feature in the eBPF profiler. Off-CPU stack traces have a different distribution compared to CPU stack traces. In particular Off-CPU stack traces are much more repetitive because they typically occur at special leaf functions such as leaf (async preemption being a notable exception). For the same reason it is also uncommon to see a stack trace that are a root-prefix of a previously observed stack trace.
We might need to revisit the previous previous benchmarks to confirm these claims, as the previous analysis seems to have shown that the location range based encoding is always either comparable or better than the mechanism proposed here.
The secondary motivation is simplicitly. Arguably the proposed change here will make it easier to write exporters, processors as well as receivers.
It seems like we had rough consensus around this change in previous SIG meetings, and it seems like a good incremental step to make progress on the timestamp proposal.