Thursday, February 13, 2014

.NET low latency logging. Part 7 - Improving performnace of asynchronous log4net solutions

Investigation of log4net performance issues

Since my previous post I made some experiments to improve the log4j2 performance in case of fast asynchronous appenders.

1) First of all I must say that using ForwardingAppender and its base class AppenderSkeleton for async forwarding appenders is a bad idea. I made some profiling investigations nad found that the following code fragment in AppenderSkeleton works too slow under a heavy load because of 'lock(this)' statement:

2) My second small improvement is to get rid of FixFlags.Message from FixFlags. In this case logging messages stay unchanged in log files, but performance will be improved cause of removing unnecessary string format during fixing the LoggingEvent. My current optimized fix flags configuration for all async appenders - FixFlags.ThreadName | FixFlags.Exception

3) Also I found that FileAppender and RollingFileAppender have an option ImmediateFlush which is enabled by default. It means that every logging event append operation into this appenders will be followed by slow Flush() call. Therefore I turned off this option in my FileAppender & RollingFileAppender configuration - <immediateFlush value="false" />

4) I must say that after all my optimizations BlockingCollection showed dramatically bad performance for the big count of producing threads (in my case > 16) and only one consuming thread. Profiler showed that problem in the BlockingCollection.Add() method.

NullAppenderEx

There is an improved version of NullAppender:

Performance of the NullAppenderEx

Let's compare its performance to the previous version:

AsyncForwardingAppenderBlockingEx, AsyncForwardingAppenderHotSwapEx, AsyncForwardingAppenderDisruptorEx

There are improved version of all my async log4net solutions:
The current configuration file is the following:

Performance of the AsyncForwardingAppenderHotSwapEx, AsyncForwardingAppenderDisruptorEx with the NullAppenderEx

Let's compare performance of the AsyncForwardingAppenderHotSwapEx, AsyncForwardingAppenderDisruptorEx with the NullAppenderEx to the previous version. Note that I completely removed AsyncForwardingAppenderBlockingEx from the result because is performs bad and extremely bad for many producing threads!

Performance of the AsyncForwardingAppenderHotSwapEx, AsyncForwardingAppenderDisruptorEx with the RollingFileAppender

Let's compare performance of the AsyncForwardingAppenderHotSwapEx, AsyncForwardingAppenderDisruptorEx with the RollingFileAppender to the previous version. Note that I completely removed AsyncForwardingAppenderBlockingEx from the result because is performs bad and extremely bad for many producing threads!

Conclusions

  • ForwardingAppender and its base class AppenderSkeleton are not fast enough to be used in low latency logging. I removed them from all my async appenders by explicit inheriting from core log4net interfaces - IAppender, IBulkAppender, IOptionHandler
  • I removed FixFlags.Message fix flag (see FixFlags) from my async logging configuration as it is not necessary for logging message representation, but wastes extra time on string format. My current optimized fix flags configuration for all async appenders - FixFlags.ThreadName | FixFlags.Exception
  • Performance of the log4net file appenders (FileAppender and RollingFileAppender) depends on ImmediateFlush option, which is turned on by default. Therefore I turned off this option in my FileAppender & RollingFileAppender configuration - <immediateFlush value="false" />
  • AsyncForwardingAppenderBlockingEx lost in the competition because of very bad performance of the BlockingCollection in case of many producing threads (in my case > 16)
  • I managed to get very good performance of optimized AsyncForwardingAppenderHotSwapEx and AsyncForwardingAppenderDisruptorEx appenders with fast NullAppenderEx
  • I still believe there is a good point to optimize the very slow FileAppender and RollingFileAppender in a way of using fast files to get similar performance as RandomAccessFileAppender shows in log4j2

Thursday, February 6, 2014

.NET low latency logging. Part 6 - Common.Logging abstraction

Common.Logging

Common.Logging is a logging facade that provides similar interface for creating loggers and perform logging operations in a unified way. It could be configured to use log4net, NLog, Enterprise Library 3.1 Logging Adapter as a back-end loggers. The question is how Common.Logging facade influence on the final logging performance? I'll test Common.Logging over log4net for different appenders - NullAppender, FileAppender, RollingFileAppender.

Add Common.Logging to the sample application

It is very simple to integrate Common.Logging into the sample application. Just add Common.Logging and log4net packages (current version is 1.2.13) with a NuGet to the project. Add to the project log4net.config file:
Initialize logging manager and create a logger in code:

Performance influence of the Common.Logging


Conclusions


Wednesday, February 5, 2014

.NET low latency logging. Part 5 - Asynchronous log4net solutions

FixFlags and log4net performance issues

When I started the implementation of the first asynchronous log4net solution I met with FixFlags problem that is well described in the topic on StackOverflow - log4net BufferingForwardingAppender performance issue. In two words the problem is related to the LoggingEvent generation and processing. When some logging appender is not processing particular LoggingEvent synchronously, but store it somewhere for further processing (e.g. BufferingForwardingAppender) than you should choose which logging events parameters should be fixed. The more you choose the less performance you'll get:
  • Choosing FixFlags.None gives the best performance, but you'll miss some necessary fields such as Exception, ThreadName, Message, etc
  • Choosing FixFlags.All gives the worst performance, but you'll get everything. Note that getting Identity may dramatically decrease your logging performance!
  • Choosing FixFlags.Message|FixFlags.ThreadName|FixFlags.Exception (268) gives the good balanced performance and covers most important fields. This is a best configuration in my case!

The following table shows how FixFlags combination influences the performance of my first asynchronous log4net solution with NullAppender (1000000 items in one thread):
FixFlags Mean latency (mcs) Producing time (s) Throughput (items/s)
FixFlags.None 0.877 0.915 1092207.476
268 1.677 1.714 583428.927
FixFlags.All 214.662 214.752 4656.522

The following table shows how FixFlags combination influences the performance of my first asynchronous log4net solution with RollingFileAppender (1000000 items in one thread):
FixFlags Mean latency (mcs) Producing time (s) Throughput (items/s)
FixFlags.None 1.214 1.339 746520.684
268 3.094 3.160 316390.130
FixFlags.All 208.232 208.320 4800.285

All next experiments with asynchronous log4net solutions will be performed with FixFlags=268 (FixFlags.Message|FixFlags.ThreadName|FixFlags.Exception).

Asynchronous log4net solution 1: BlockingCollection

The first solution of asynchronous logging is based on BlockingCollection from System.Collections.Concurrent namespace. BlockingCollection is a thread-safe container that implements producer-consumer pattern.

The implementation of the AsyncForwardingAppenderBlocking is based on ForwardingAppender with the following configuration options:
  • BlockType - type of the blocking strategy. The following types are supported:
    • Error - Error on buffer limit (default)
    • Skip - Skip on buffer limit
    • Grow - Grow on buffer limit
    • Block - Block on buffer limit
  • BlockLimit - Blocking buffer limit (default is 3000000)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderBlocking under the spoiler:

Asynchronous log4net solution 2: Hot swap technique

The second solution of asynchronous logging is based on Hot swap technique. The idea is to have two collections - one for appending, one for processing, and swap them under the lock when all data is processed and new data is available.

This solution has one problem that can be harmful in case of huge amount of logging events are generated and slow processing - unlimited growing of collections and possible OutOfMemoryException. For this reason if you have unlimited growing strategy pay attention on throughput, because if logging event generation speed is much higher than processing speed then collections will grow!

The implementation of the AsyncForwardingAppenderHotSwap is has the following configuration options:
  • GrowType - type of the growing strategy. The following types are supported:
    • Error - Error on buffer limit (default)
    • Skip - Skip on buffer limit
    • Grow - Grow on buffer limit
  • GrowLimit - Growing limit (default is 3000000)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderHotSwap under the spoiler:

Asynchronous log4net solution 3: .NET port of LMAX Disruptor

The third solution of asynchronous logging is based on LMAX Disruptor which was ported to the .NET Framework (see disruptor-net). For more information on LMAX Disruptor pattern you may refer to the Martin Fowler's article - The LMAX Architecture.

The implementation of the AsyncForwardingAppenderDisruptor is has the following configuration options:
  • ClaimStrategy - type of the claim strategy. The following types are supported:
    • SingleThreaded - Optimized strategy can be used when there is a single publisher thread claiming sequences. This strategy must not be used when multiple threads are used for publishing concurrently on the same Sequencer.
    • MultiThreaded - Strategy to be used when there are multiple publisher threads claiming sequences. This strategy is reasonably forgiving when the multiple publisher threads are highly contended or working in an environment where there is insufficient CPUs to handle multiple publisher threads. It requires 2 CAS operations for a single publisher, compared to the MultithreadedLowContention strategy which needs only a single CAS and a lazySet per publication. Default strategy.
    • MultiThreadedLowContention - Strategy to be used when there are multiple producer threads claiming sequences. This strategy requires sufficient cores to allow multiple publishers to be concurrently claiming sequences.
  • WaitStrategy - type of the wait strategy. The following types are supported:
    • Blocking - Blocking strategy that uses a lock and condition variable for waiting on a barrier. This strategy should be used when performance and low-latency are not as important as CPU resource.
    • BusySpin - Busy Spin strategy that uses a busy spin loop for waiting on a barrier. This strategy will use CPU resource to avoid syscalls which can introduce latency jitter. It is best used when threads can be bound to specific CPU cores.
    • Sleeping - Sleeping strategy that uses a System.Threading.SpinWait while the are waiting on a barrier. This strategy is a good compromise between performance and CPU resource. Latency spikes can occur after quiet periods.
    • Yielding - Yielding strategy that uses a Thread.Sleep(0) for waiting on a barrier after an initially spinning. This strategy is a good compromise between performance and CPU resource without incurring significant latency spikes. Default strategy.
  • BufferSize - Disruptor buffer size (default is 1024)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderDisruptor under the spoiler:

Updated log4net configuration

Under the spoiler you'll find an updated log4net configuration file with new async appenders:

Performance results of asynchronous log4net solutions with NullAppender


Performance results of asynchronous log4net solutions with RollingFileAppender


Conclusions

  • log4j2 still the fastest logging solution! But it's for Java platform...
  • AsyncForwardingAppenderBlocking solution based on BlockingCollection performs very bad, especially for several logging producers.
  • AsyncForwardingAppenderHotSwap solution based on Hot swap technique performs the best. However it might requires additional memory and even causes OutOfMemoryException when logging events are produced very fast!
  • AsyncForwardingAppenderDisruptor solution based on .NET port of LMAX Disruptor performs good, but it is still far from log4j2 even with NullAppender. It is a good point to investigate possible performance issues in disruptor-net or its wrong usage for the logging case.
  • RollingFileAppender which is responsible for file IO operations in log4net is much slower than log4j2 analog - RandomAccessFileAppender. It is also a good point of future investigation.