Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on Karaf when using the command "shutdown --reboot" #561

Open
j3rem1e opened this issue Jan 21, 2025 · 17 comments
Open

Memory leak on Karaf when using the command "shutdown --reboot" #561

j3rem1e opened this issue Jan 21, 2025 · 17 comments

Comments

@j3rem1e
Copy link

j3rem1e commented Jan 21, 2025

I have noticed a memory leak in pax-logging in Karaf 4.4.6 when using the command shutdown --reboot. I have millions of loggers stored in the static variable m_loggers.

After investigation, it seems that I have two pax-logging-api bundles: the first one is in the 'uninstalled' state and the second one is 'started'. It is this first bundle that holds the affected collection of m_loggers.

The initial issue likely originates from the shutdown command, or from references preventing the bundles from unloading completely, but perhaps we should add a safeguard here to handle this scenario? the command seems like to work correctly, but the jvm crashed severals days after because of this memory leak.

@jbonofre
Copy link
Member

I guess you have two different versions of pax-logging-api bundle right ?

Is it a custom distribution you have or the karaf standard one (after installing additional features) ?

@grgrzybek
Copy link
Member

grgrzybek commented Jan 21, 2025 via email

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

I dont' have two versions of pax logging, only one (2.2.7).

I see two instances of the bundle by analysing the heapdump. the bundle "uninstalled" is not visible.
the two instances target the same jar ("/data/cache/org.eclipse.osgi/4/0/bundleFile") with the same revision number (0). I use Equinox.

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

The root cause of this issue seems to be a camel route/thread which have not be stopped when the shutdown occured, and this thread keeps references to a logger/classloader.

@mattrpav
Copy link
Member

Can you share the thread dump and heap dump?

Note: Joining the #karaf Slack channel on Apache’s Slack server would be a way to securely share heap dump with committees without a public link

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

I'm afraid I can't share a heapdump. However, I'll try to create a test case.

probably something like :

  • In a BundleActivator.start(), create a ScheduledexecutorService, and schedule a task doing only Logger.getLogger(..)
  • Don't shutdown the ScheduledExecutorService in the BundleActivator.stop()

@grgrzybek
Copy link
Member

Don't shutdown the ScheduledExecutorService in the BundleActivator.stop()

So you'll get a memory leak ;)

@grgrzybek
Copy link
Member

Please check https://github.com/ops4j/org.ops4j.pax.logging/blob/main/pax-logging-it/src/test/java/org/ops4j/pax/logging/it/Log4J2MemoryIntegrationTest.java where I obtain tens of thousands of loggers under strict memory conditions:

-Xmx64M -XX:+HeapDumpOnOutOfMemoryError

Initially this test was failing, but after fixing #374 it was fine.

@mattrpav
Copy link
Member

Don't shutdown the ScheduledExecutorService in the BundleActivator.stop()

So you'll get a memory leak ;)

@j3rem1e I think it helps to understand that bundle .stop() is a lifecycle hook, but not a process control that has the ability to forcible stop anything started. When threads or other things are started in the bundle, they must adhere to the lifecycle or it will lead to leaks.

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

Yes, you're right, the root cause is not in pax-logging, like I said in my original report.

In my case, it is a weird bugs inside a FileConsumer in a route in camel. However, the memory leak in camel is harmless, but camel get a dynamic logger, and because it references a closed bundleContext, the logger is added to a list which are never cleaned. There is probably ThreadLocal leaking somewhere too.

That's why I said we can add a safeguard here : if the Activator is closed, then ignore the created logger and don't add it to the m_loggers property.

@mattrpav
Copy link
Member

@j3rem1e A bundle Activator is only one of several ways for there to be activated Java class instances running in a bundle-- in short, this wouldn't be a quick or comprehensive fix.

Does the Camel route use any Java processors or externally created thread pools or connection pools? It is common 'miss' for developers to start a pooled ActiveMQ connection factory (or JDBC pooled data source), but not call the stop on shutdown.

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

The camel routes is only a simple FileConsumer but it couldn't be initialized because of a duplicate endpoint id. It was a user mistake in a configuration file. However, the server crashed severals days after a "shutdown --reboot", with a 98% memory allocation attributed to pax-logging.

@mattrpav
Copy link
Member

Sounds like pax-logging is bubbling up the symptoms or signs of the problem, but isn't the root cause?

@j3rem1e
Copy link
Author

j3rem1e commented Jan 21, 2025

yes, that what I try to said.

Imo as a low level logging library pax-logging should deal with errors caused by other lib in order to be more resilient, but I understand your point. feel free to close this ticket, I can use a local patch.

@grgrzybek
Copy link
Member

That looks similar to classical ClassLoader leak - you leak one class, it references a classloader which references all classes loaded earlier.
This time in our great OSGi world...
Many years ago I was looking for classloader leaks in OSGi finding tens of leaks when OSGi hosted bundles like Jetty, JGit, Zookeeper, ...
If you can't share the heap dump, you could try this special OQL query within VisualVM:

var WIRINGS_COUNT = 1;
var revisions = heap.objects("org.apache.felix.framework.BundleRevisionImpl", false);
var revisionsAndWirings = [];
while (revisions.hasMoreElements()) {
  var el = revisions.nextElement()
  var wirings = toArray(filter(referrers(el), 'classof(it).name == "org.apache.felix.framework.BundleWiringImpl"'));
  revisionsAndWirings.push({revision: el, sn: el.m_symbolicName, wirings: wirings});
}

map(filter(revisionsAndWirings, 'count(it.wirings)>WIRINGS_COUNT'), '{r: it.revision, sn: it.sn.toString(), w: toHtml(it.wirings)}')

This looks for BundleRevisions which have more than WIRINGS_COUNT number of BundleWiring instances referring to this revision - which effectively looks for these refreshed bundles that didn't fully drop their previous wiring (which is 1:1 with BundleClassLoader).

Without checking your idea that "get logger" should do some check like "is the bundle active?" but the q. is - which bundle? Pax Logging may not get refreshed, only the referring bundle (bundle using loggers) may be stale. In other words - not that easy...

@j3rem1e
Copy link
Author

j3rem1e commented Jan 22, 2025

Yeah you're right ^^

I work with OSGI since 20 years and 15 with servicemix/karaf, and I only discovered the "shutdown --reboot" commands weeks ago. At first, I thought it was a great command because it allowed me to reboot without connecting to the host server. However, I realize now that shutting down and restarting an OSGI framework without stopping the JVM is not well tested, at least in my context. I have some leaks, and most of them are in 3rd party libs. The directive here is to not use this command for now.

I'm not sure to see your point of the refreshed bundle however. To me, this leak can only occurs when the Activator.stop() method has been called (and so setPaxLoggingManager(null) has been called to the factories). Having a static flags "stopped" which is reset in the start method should do the job ?

@grgrzybek
Copy link
Member

@j3rem1e I'd simply have to have a look at your configuration and heap... So treat my comments as polite responses without strict recommendations ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants