Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Java agent #134

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

kusalk
Copy link

@kusalk kusalk commented Oct 11, 2024

Hi @sohyun-ku, @taeyeon-Kim and others 👋🏾

I've been working on some performance optimisations for the Java agent. I'd be keen to get your thoughts on these, and if you think they are worthwhile.

The optimisations in this PR consist of the following:

  1. MethodRegistry#extractSignature algorithm optimisation
  2. Filter synthetic class signatures during ByteBuddy advice installation (instead of on invocation)
  3. Rewrite InvocationRegistry to use ConcurrentHashMap (rather than LinkedBlockingQueue) and remove worker thread

I've also made the following additional changes:

  1. Use JDK21 with target level 8 for Java agent
    1. Retain target level 21 for unit test module
  2. Upgrade Mockito test dependency to latest version (unblocked by 1i.)
  3. Upgrade Gradle Shadow plugin to latest version
  4. Extract common WireMock integration test setup into abstract class
  5. Add JMH benchmark runner integration test
  6. Reworked dependency injection for InvocationTracker and Scheduler
  7. Isolated SchedulerTest unit tests to only test Scheduler (unblocked by 6.)
  8. Do not log exception stack trace if collector service unavailable

These changes can mostly be reviewed commit by commit but I can also split them into multiple PRs if it would be easier to review.

Whilst developing and concluding the implementation of these optimisations, I ran some JMH benchmarks on my local machine to verify and measure the performance improvements.

Whilst I have limited experience writing and running such benchmarks, I took some care to avoid the most common pitfalls. I collected these results using Java 21 and the compiler blackhole configuration. In saying that, there are some discrepancies that I wasn't able to completely explain. So whilst I'm not completely confident in the absolute numbers, I have a reasonable level of confidence that the ordering of the results is correct.

Method hashing - MethodRegistry#getHash (ConcurrentHashMap cache disabled)

Implementation Measurement (ns/op)
Before 1016.663 ± 146.226
extractSignature optimised 865.494 ± 71.256
Above + RegEx moved 581.834 ± 46.560

Invocation registering - InvocationRegistry#register

In the following benchmarks, 'not in buffer' refers to the state of the invocations buffer immediately upon agent startup, whereas 'reset buffer' refers to the state immediately after invocation data is published and the buffer is cleared.

Implementation Not in buffer (ns/op) In buffer (ns/op) Reset buffer (ns/op)
Before 261.195 ± 17.170 (+~220 async) 3.027 ± 0.117 Same as not in buffer
ConcurrentSet* 190.002 ± 18.514 7.155 ± 0.237 Same as not in buffer
ConcurrentMap 210.502 ± 6.470 6.298 ± 0.779 152.111 ± 22.679

*This was my initial alternative implementation (44c2418), I ended up settling on a slightly different approach which offers better 'reset buffer' performance and is arguably simpler.

Overall - InvocationTracker#onInvocation

Scenario Before PR (ns/op) After PR (ns/op)
Hash uncached, reset buffer 2540.221 ± 418.102 (+~220 async) 748.109 ± 58.270
Hash cached, reset buffer 267.910 ± 4.762 (+~220 async) 199.192 ± 9.958
Hash cached, in buffer 10.033 ± 0.389 12.836 ± 0.120

In a real world scenario, the vast majority of invocations fall into the 'hash cached, in buffer' scenario, where the latency is minimal irrespective of the optimisations. However, the latency spikes on application start up and momentarily after every publication. Also notably, the worker thread is eliminated, leading to further indirect performance gains. In complex web applications with many tracked methods, there can be 10,000+ tracked invocations per served request. This can sum up to a delay in the order of milliseconds and thus these optimisations should help measurably reduce the worst-case performance of the agent.


Following a discussion in the comments, the change of the hashing algorithm from MD5 to MurmurHash was extracted into a separate PR, #142.


Memory profiling

The Scavenger agent was attached to a Java application and then a sequence of E2E tests were executed against the application. This was done in an automated and reproducible manner. A memory snapshot was then taken, prior to the first invocations publication. Despite efforts to control the environment, there is no guarantee that the volume and sequence of corresponding Java method invocations will be identical between independent runs, and so the following profiling results should be taken with a grain of salt.

Before PR
image

After PR
image

Comparison

Object Before PR (kB retained) After PR (kB retained)
MethodRegistry 762 660
InvocationRegistry 660 923

Analysis
The memory usage of the MethodRegistry has seemingly decreased. This is likely due to optimisation (2) where we no longer cache synthetic method signatures, instead excluding them on advice installation.

The memory usage of the InvocationRegistry has seemingly increased. This is expected as we are now using a more complex data structure comprising of a ConcurrentHashMap and BooleanHolders rather than just a HashSet. This increased memory usage allows us to benefit from improved invocation registering performance on startup and immediately after a publication (refer to 'Invocation registering' benchmarks above).

The total memory usage of these 2 objects is more difficult to ascertain as the method signature hash Strings are shared between the 2 objects and so the total should be less than their individual sums. Overall, I am not too concerned about the changes to memory usage here as the memory growth is still bounded by the number of methods tracked and the absolute memory usage is still relatively low.

@kusalk kusalk requested review from junoyoon and a team as code owners October 11, 2024 11:27
Copy link
Contributor

@taeyeon-Kim taeyeon-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thank you for contributing such a great PR.
An agent is characterized by the fact that it uses the client's resources.
If the scavenger agent is using more of the user's resources (CPU, memory, etc..) than before, I think it's a bigger issue than performance improvement.
Is it possible to measure this as well?
(It doesn't seem to be an issue from the looks of it).

cc. @sohyun-ku @kojandy

@kusalk
Copy link
Author

kusalk commented Oct 15, 2024

I'll see what I can do :)

@kusalk kusalk force-pushed the async-hashing branch 2 times, most recently from d3cf1fa to 6c65b22 Compare December 24, 2024 05:24
@kusalk kusalk force-pushed the async-hashing branch 2 times, most recently from 41a92ee to 8f92a70 Compare December 24, 2024 06:25
Copy link

github-actions bot commented Dec 24, 2024

Scavenger Test Results

167 files  167 suites   1m 39s ⏱️
277 tests 271 ✅ 6 💤 0 ❌
299 runs  293 ✅ 6 💤 0 ❌

Results for commit 135a84e.

♻️ This comment has been updated with latest results.

@taeyeon-Kim taeyeon-Kim added the enhancement New feature or request label Dec 30, 2024
@kusalk
Copy link
Author

kusalk commented Jan 7, 2025

Hi @taeyeon-Kim

I've added memory profiling results to the PR description.

However, despite my best efforts, I was unable to obtain consistent CPU profiling results from which we can make a meaningful comparison between before and after my changes. I suspect sampling based profiling is not conducive to measuring methods whose invocation time is in the order of nanoseconds. From the rough testing I have done, the overall impact on a running Java application is essentially negligible both before and after these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants