Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Places to potentially find startup performance improvements #45436

Open
geoand opened this issue Jan 8, 2025 · 25 comments
Open

Places to potentially find startup performance improvements #45436

geoand opened this issue Jan 8, 2025 · 25 comments
Labels
kind/enhancement New feature or request

Comments

@geoand
Copy link
Contributor

geoand commented Jan 8, 2025

Description

These ideas come from looking at this flamegraph

@geoand geoand added the kind/enhancement New feature or request label Jan 8, 2025
@franz1981
Copy link
Contributor

@geoand I would add #43022 as well
since, with 2 or more event loops, the "new" concurrency allowed by the CL make it load more byte[] and fail on definition later on during class loading. This is both impacting memory and time.

@geoand
Copy link
Contributor Author

geoand commented Jan 8, 2025

ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

@dmlloyd I wanted to move the relevant fields into a holder, but the need to support serialization prevents this approach.

@franz1981
Copy link
Contributor

franz1981 commented Jan 8, 2025

Another "big" (relatively) one @geoand is this -> smallrye/smallrye-common#370

this is very easy: just use 64 bytes cache line as other known projects does i.e. https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig#L429 (which for both arm and x86 uses 2 * 64 bytes due to jbossas/jboss-threads#191)

We are reading from a proc file, in linux, performing both a syscall, parsing and creating garbage for such. It looks an easy win with low impact.

@geoand
Copy link
Contributor Author

geoand commented Jan 8, 2025

Just for posterity, the comment above should be handled by jbossas/jboss-threads#216

@radcortez
Copy link
Member

It also has to be handled here:
https://github.com/smallrye/smallrye-common/blob/e43dfdf9b512185fd8c96ad720fcd4070ff12030/cpu/src/main/java/io/smallrye/common/cpu/CacheInfo.java#L80-L155

I did a quick return with the expected bytes, and on my Mac I noticed around 20 ms improvement in the JVM startup time.

@radcortez
Copy link
Member

  • ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

What's the plan with HostName? After #41664, I'm getting 2 / 3 ms increase in the native image startup time.

@geoand
Copy link
Contributor Author

geoand commented Jan 10, 2025

It also has to be handled here: https://github.com/smallrye/smallrye-common/blob/e43dfdf9b512185fd8c96ad720fcd4070ff12030/cpu/src/main/java/io/smallrye/common/cpu/CacheInfo.java#L80-L155

I did a quick return with the expected bytes, and on my Mac I noticed around 20 ms improvement in the JVM startup time.

With the change from @franz1981, I think that CacheInfo won't be used/loaded at all in Quarkus. Did you see otherwise @radcortez ?

@geoand
Copy link
Contributor Author

geoand commented Jan 10, 2025

  • ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

What's the plan with HostName? After #41664, I'm getting 2 / 3 ms increase in the native image startup time.

I was hoping @dmlloyd has an idea as my original idea won't work.

@Sanne
Copy link
Member

Sanne commented Jan 10, 2025

Offload Vertx / Netty class loading to other thread (like we do for DefaultChannelId)

Regarding this, we also do something for async initialize the JPA components. Should we have a unified/consistent approach? Just flagging it for awareness.

@geoand
Copy link
Contributor Author

geoand commented Jan 10, 2025

Regarding this, we also do something for async initialize the JPA components. Should we have a unified/consistent approach? Just flagging it for awareness.

Right, that thought did cross my mind

@dmlloyd
Copy link
Member

dmlloyd commented Jan 10, 2025

  • ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

What's the plan with HostName? After #41664, I'm getting 2 / 3 ms increase in the native image startup time.

2/3 ms for host name or for process name? or both? That seems like a pretty crazy number. Perhaps for host name, we should track the time and print a warning if it seems like DNS may be misconfigured (check your /etc/hosts). The user also has the option of setting the HOSTNAME environment variable, which bypasses some work.

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

application.properties / META-INF/microprofile-config.properties reading from classpath

@radcortez do you plan on taking this one on soon? Asking because it definitely has a measurable impact.

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

@dmlloyd any ideas about tackling this one?

@radcortez
Copy link
Member

application.properties / META-INF/microprofile-config.properties reading from classpath

@radcortez do you plan on taking this one on soon? Asking because it definitely has a measurable impact.

Yes, but only next week.

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

🙏

@dmlloyd
Copy link
Member

dmlloyd commented Jan 15, 2025

ExtLogRecord looking up HostName and Process related data that aren't needed by the default logging pattern

@dmlloyd any ideas about tackling this one?

I did mention this:

What's the plan with HostName? After #41664, I'm getting 2 / 3 ms increase in the native image startup time.

2/3 ms for host name or for process name? or both? That seems like a pretty crazy number. Perhaps for host name, we should track the time and print a warning if it seems like DNS may be misconfigured (check your /etc/hosts). The user also has the option of setting the HOSTNAME environment variable, which bypasses some work.

Another idea is to add access to the gethostname system call in Java 22/23/24+ using FFM which might be fast (it does rely on method handle setup though, and a couple of other things).

@geoand
Copy link
Contributor Author

geoand commented Jan 16, 2025

But if I'm not mistaken, the default Quarkus logging pattern does not use the hostname (or any process info), so wouldn't it be better if we didn't try to load it eagerly?

@dmlloyd
Copy link
Member

dmlloyd commented Jan 16, 2025

That is an option. But I think it works be best to understand why it is slow and see if it can be fixed before jumping to that.

@geoand
Copy link
Contributor Author

geoand commented Jan 16, 2025

On my machine (Linux), it definitely does not take nearly as long as @radcortez, but both the call to HostName and Process are noticeable:

Image

Looking at the flamegraph and the code, there don't seem to be any obvious wins for my case.
Sure setting the system properties would avoid the hostname cost, by I don't see why one would have to do that. Getting rid of the method references might also help a bit, but we are still paying the (admittedly small) cost for something that in most cases will not be used

@radcortez
Copy link
Member

I don't see anything unusual in my /etc/hosts. The interesting thing is that the change has been available since Quarkus 3.8.6, and I've measured the time with 3.9, 3.10, 3.11, and 3.12 and I didn't notice any specific degradation coming from the Hostname. Setting the HOSTNAME env var didn't have any impact either.

Starting with 3.13, including the current main, I notice the startup degradation, and setting HOSTNAME does make it go away. Used same terminal, same settings, same JVM.

Looking into the flamegraphs, I didn't notice anything unusual. Maybe it is something that only applies to native mode?

@gsmet
Copy link
Member

gsmet commented Jan 21, 2025

This is odd, maybe if you preferred two binaries, someone from the Mandrel team could help figure out what's going on?

If you can pinpoint the exact version, I can also check if something rings a bell.

@geoand
Copy link
Contributor Author

geoand commented Jan 21, 2025

Offload Vertx / Netty class loading to other thread (like we do for DefaultChannelId)

Regarding this, we also do something for async initialize the JPA components. Should we have a unified/consistent approach? Just flagging it for awareness.

I didn't do anything unified, but this test does yield something like a 5ms improvement on JVM startup time for me, however I am not sure it's worth the memory cost of starting new threads

@radcortez
Copy link
Member

This is odd, maybe if you preferred two binaries, someone from the Mandrel team could help figure out what's going on?

If you can pinpoint the exact version, I can also check if something rings a bell.

This is just using config-quickstart and updating the version. I notice the increased degradation moving from 3.12.0 to 3.13.0. I didn't check specific fix versions.

@radcortez
Copy link
Member

radcortez commented Jan 21, 2025

Another thing that I've noticed is that previous versions (<= 3.12.0) were more stable at the startup time. Startup time would measure between 18-20 ms with an ocassional 16 ms.

After that, the range got higher 18-24 ms, and less consistent on the startup time, with a tendency to the higher side of the bracket.

@radcortez
Copy link
Member

After some digging, I guess the reason it didn't happen earlier was that the Wildfly Common class was not used anywhere, and no reinitialization was required. Once it was added by #41664, it requires that extra work which adds 3-4 ms startup time. Also, notice that there is a substitution for the native image: https://github.com/smallrye/smallrye-common/blob/062dec69c655fc49c60ecc20283fed3f8fc1c312/net/src/main/java/io/smallrye/common/net/Substitutions.java#L20-L104

I did measure that code, and it accounts for the 3-4 ms that I'm observing in the startup time. I chatted with @geoand, and apparently, it doesn't notice such a delay, so maybe it is something specific for Mac? Strange because the code should be the same for both.

I also tried to measure the cost without substitution, and it is mostly the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants