[ODLPARENT-176] Reduce GC pressure and lock contention by ditching the PaxOsgi log appender Created: 13/Nov/18 Updated: 15/Mar/19 Resolved: 15/Mar/19 |
|
| Status: | Resolved |
| Project: | odlparent |
| Component/s: | None |
| Affects Version/s: | 3.1.4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Medium |
| Reporter: | Michael Vorburger | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I'm looking at a Java Flight Recording obtained from (internal) scale lab testing, based on Oxygen SR3 code, and thought it was curious / interesting that the following code paths appears very frequently as a big "TLAB allocation". What's interesting is that in a real world scale test this is the only log that shows up as a big "TLAB allocation" - so it seems to me that a closer look at this code is worth it. Perhaps it's even a simple case of reducing an over INFO to a DEBUG log. StackTraceElement java.lang.Throwable.getStackTraceElement(int) 3875 StackTraceElement[] java.lang.Throwable.getOurStackTrace() 3875 StackTraceElement[] java.lang.Throwable.getStackTrace() 3875 StackTraceElement org.apache.logging.log4j.core.impl.Log4jLogEvent.calcLocation(String) 3771 StackTraceElement org.apache.logging.log4j.core.impl.MutableLogEvent.getSource() 3771 PaxLocationInfo org.ops4j.pax.logging.log4j2.internal.PaxLoggingEventImpl.getLocationInformation() 3771 void org.ops4j.pax.logging.log4j2.internal.PaxAppenderProxy.doAppend(PaxLoggingEvent) 3771 void org.ops4j.pax.logging.log4j2.appender.PaxOsgiAppender.append(LogEvent) 3771 void org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(LogEvent) 3771 void org.apache.logging.log4j.core.config.AppenderControl.callAppender0(LogEvent) 3771 void org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(LogEvent) 3771 void org.apache.logging.log4j.core.config.AppenderControl.callAppender(LogEvent) 3771 void org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LogEvent) 3771 void org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LogEvent) 3771 void org.apache.logging.log4j.core.config.LoggerConfig.log(LogEvent) 3771 void org.apache.logging.log4j.core.config.LoggerConfig.log(String, String, Marker, Level, Message, Throwable) 3771 void org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(Supplier, String, String, Marker, Level, Message, Throwable) 3771 void org.apache.logging.log4j.core.Logger.logMessage(String, Level, Marker, Message, Throwable) 3771 void org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.doLog0(Level, int, String, String, Throwable) 3771 void org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.doLog(Level, int, String, String, Throwable) 3771 void org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.inform(String, Throwable) 3646 void org.ops4j.pax.logging.internal.TrackingLogger.inform(String, Throwable) 3646 void org.ops4j.pax.logging.slf4j.Slf4jLogger.info(String, Object) 3542 List org.opendaylight.netvirt.vpnmanager.VpnUtil.getRts(VpnInstanceOpDataEntry, VpnTarget$VrfRTType) 3466 boolean org.opendaylight.netvirt.vpnmanager.VpnUtil.lambda$getVpnsImportingMyRoute$10(VpnInstanceOpDataEntry, VpnInstanceOpDataEntry) 3466 boolean org.opendaylight.netvirt.vpnmanager.VpnUtil$$Lambda$1872.1909057161.test(Object) 3466 void java.util.stream.ReferencePipeline$2$1.accept(Object) 3466 void java.util.stream.ReferencePipeline$2$1.accept(Object) 3466 void java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Consumer) 3466 void java.util.stream.AbstractPipeline.copyInto(Sink, Spliterator) 3466 Sink java.util.stream.AbstractPipeline.wrapAndCopyInto(Sink, Spliterator) 3466 Object java.util.stream.ReduceOps$ReduceOp.evaluateSequential(PipelineHelper, Spliterator) 3466 Object java.util.stream.AbstractPipeline.evaluate(TerminalOp) 3466 Object java.util.stream.ReferencePipeline.collect(Collector) 3466 List org.opendaylight.netvirt.vpnmanager.VpnUtil.getVpnsImportingMyRoute(DataBroker, String) 3466 void org.opendaylight.netvirt.vpnmanager.populator.impl.L3vpnOverMplsGrePopulator.populateFib(L3vpnInput, WriteTransaction) 3466 void org.opendaylight.netvirt.vpnmanager.VpnInterfaceManager.processVpnInterfaceAdjacencies(BigInteger, int, String, String, String, long, WriteTransaction, WriteTransaction, WriteTransaction, Interface) 3466 void org.opendaylight.netvirt.vpnmanager.VpnInterfaceManager.processVpnInterfaceUp(BigInteger, VpnInterface, String, int, boolean, WriteTransaction, WriteTransaction, WriteTransaction, Interface, String) 3466 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener.lambda$null$0(Interface, WriteTransaction, WriteTransaction, WriteTransaction) 3461 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener$$Lambda$1690.1216649502.accept(Object) 3461 ListenableFuture org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(CheckedConsumer) 3461 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener.lambda$null$1(Interface, WriteTransaction, List, WriteTransaction) 3461 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener$$Lambda$1673.843807247.accept(Object) 3461 ListenableFuture org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(CheckedConsumer) 3461 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener.lambda$null$2(Interface, List, WriteTransaction) 3461 void org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener$$Lambda$1660.1073420380.accept(Object) 3461 ListenableFuture org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(CheckedConsumer) 3461 List org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener.lambda$add$3(Interface) 3461 Object org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener$$Lambda$1654.1406813511.call() 3461 void org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging() 3461 void org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run() 3461 boolean java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec() 3461 int java.util.concurrent.ForkJoinTask.doExec() 3461 void java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinTask) 3461 void java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool$WorkQueue) 3461 void java.util.concurrent.ForkJoinWorkerThread.run() 3461 |
| Comments |
| Comment by Michael Vorburger [ 13/Nov/18 ] |
|
Hm, I've just had a quick closer look at VpnUtil (not exactly the code used in this test; but doubtful something signifant related to this changed) and it looks perfectly harmless, nothing obvious jumps out:
getRts(...) {
String name = vpnInstance.getVpnInstanceName();
...
LOG.info("getRts: vpn targets not available for {}", name);
So this is ... "normal", as in not an application specific issue? It's just a bit curious it only shows up here. I guess the real question here actually is why Pax Logging is doing a Throwable.getStackTraceElement() for an INFO when we (hopefully!) don't have have that (expensive) option, the thing which provides stack trace locations logging switched on - that'a a bit surprising, do you agree skitt and tpantelis ? We wouldn't, shouldn't (that's another story) run with INFO logging enabled in production scale tests, so I'm tempted to just close this issue - it's probably not the biggest issue in object allocation and GC -we've got bigger fish to fry. |
| Comment by Robert Varga [ 13/Nov/18 ] |
|
pax-logging is using it to determine which bundle the call comes from. Change logging formatter to not include it and it's gonna disappear. |
| Comment by Tom Pantelis [ 13/Nov/18 ] |
|
It's getting location info - for line # maybe? I assume it does that for any log operation independent of level. It doesn't seem like that log message should be info - seems more like a warn - but if it occurs a lot and/or is expected to occur then it really should be debug. |
| Comment by Michael Vorburger [ 14/Nov/18 ] |
|
rovarga, tpantelis and skitt Right, now I see... PaxOsgiAppender it is. Let me move this issue from netvirt to odlparent, to make it more general. – So, if there is even the slightest chance that ditching that OSGi specific appender helps with any runtime performance and garbagge collection overhead, then I'd say why don't we do that. The loss of the Bundle in each log message won't be missed by anyone, I think; in particular because in ODL we have clean package names - so the Bundle & package information really is redundant with the package information. Therefore just so proposed in c/77790 ... |
| Comment by Stephen Kitt [ 15/Nov/18 ] |
If it’s a TLAB allocation, it isn’t big. |
| Comment by Robert Varga [ 15/Nov/18 ] |
|
I tend to agree: if it's worth emitting into logs (and thereby doing IO) it's probably not a huge deal. If this is showing up, it's probably because VpnUtil is being noisy – dial its logging down (in code or in config) instead... |
| Comment by Michael Vorburger [ 16/Nov/18 ] |
|
I beg to differ.. it did show up, and it's simple enough to change, and we don't loose any (useful) functionality, so even there is even a chance that this helps reduce objection allocation and GC pressure, I really don't see a reason why we would not do this. Due to rovarga -1 on https://git.opendaylight.org/gerrit/#/c/77790/ (quote: "I am not against this, just against it being done without a major version bump. Downstream deployments can (and really are) expected to tweak logging anyway.") I'm abandoning the proposed change for the short term and moving this issue to Fix Version odlparent 5. |
| Comment by Michael Vorburger [ 16/Nov/18 ] |
|
Interesting update: Upon looking at a JFR again, I've now noticed (missed this before) that our friend PaxOsgiAppender also shows up as a problem in a completely different place, totally unrelated to Memory & GC: In JDK Mission Control's "Lock Instances" I have a big red exclamation mark and this text:
The top #1 entry is something unrelated (Jetty; that may be normal), but the #2 & #4 spots are both ... PaxOsgiAppender. Another good reason to get rid of this IMHO. (The #3 spot is log4j.core.appender.rolling.RollingFileManager.checkRollover.) |
| Comment by Tom Pantelis [ 16/Nov/18 ] |
|
Maybe but, as Robert alluded to, if the PaxOsgiAppender is causing issues wrt load then we're probably logging too much unnecessarily which can lead to other issues. Let's first dial down VpnUtil logging. I'll push a patch for that. |
| Comment by Tom Pantelis [ 16/Nov/18 ] |
| Comment by Robert Varga [ 15/Mar/19 ] |
|
As noted, downstreams are in control of how much they log, which should be reasonable amounts. Packagers are free to reconfigure logging (and will probably use something different than plain files just to allow automatic analysis). Let's leave things as they are. |