Wednesday, February 5, 2014

.NET low latency logging. Part 5 - Asynchronous log4net solutions

FixFlags and log4net performance issues

When I started the implementation of the first asynchronous log4net solution I met with FixFlags problem that is well described in the topic on StackOverflow - log4net BufferingForwardingAppender performance issue. In two words the problem is related to the LoggingEvent generation and processing. When some logging appender is not processing particular LoggingEvent synchronously, but store it somewhere for further processing (e.g. BufferingForwardingAppender) than you should choose which logging events parameters should be fixed. The more you choose the less performance you'll get:
  • Choosing FixFlags.None gives the best performance, but you'll miss some necessary fields such as Exception, ThreadName, Message, etc
  • Choosing FixFlags.All gives the worst performance, but you'll get everything. Note that getting Identity may dramatically decrease your logging performance!
  • Choosing FixFlags.Message|FixFlags.ThreadName|FixFlags.Exception (268) gives the good balanced performance and covers most important fields. This is a best configuration in my case!

The following table shows how FixFlags combination influences the performance of my first asynchronous log4net solution with NullAppender (1000000 items in one thread):
FixFlags Mean latency (mcs) Producing time (s) Throughput (items/s)
FixFlags.None 0.877 0.915 1092207.476
268 1.677 1.714 583428.927
FixFlags.All 214.662 214.752 4656.522

The following table shows how FixFlags combination influences the performance of my first asynchronous log4net solution with RollingFileAppender (1000000 items in one thread):
FixFlags Mean latency (mcs) Producing time (s) Throughput (items/s)
FixFlags.None 1.214 1.339 746520.684
268 3.094 3.160 316390.130
FixFlags.All 208.232 208.320 4800.285

All next experiments with asynchronous log4net solutions will be performed with FixFlags=268 (FixFlags.Message|FixFlags.ThreadName|FixFlags.Exception).

Asynchronous log4net solution 1: BlockingCollection

The first solution of asynchronous logging is based on BlockingCollection from System.Collections.Concurrent namespace. BlockingCollection is a thread-safe container that implements producer-consumer pattern.

The implementation of the AsyncForwardingAppenderBlocking is based on ForwardingAppender with the following configuration options:
  • BlockType - type of the blocking strategy. The following types are supported:
    • Error - Error on buffer limit (default)
    • Skip - Skip on buffer limit
    • Grow - Grow on buffer limit
    • Block - Block on buffer limit
  • BlockLimit - Blocking buffer limit (default is 3000000)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderBlocking under the spoiler:

Asynchronous log4net solution 2: Hot swap technique

The second solution of asynchronous logging is based on Hot swap technique. The idea is to have two collections - one for appending, one for processing, and swap them under the lock when all data is processed and new data is available.

This solution has one problem that can be harmful in case of huge amount of logging events are generated and slow processing - unlimited growing of collections and possible OutOfMemoryException. For this reason if you have unlimited growing strategy pay attention on throughput, because if logging event generation speed is much higher than processing speed then collections will grow!

The implementation of the AsyncForwardingAppenderHotSwap is has the following configuration options:
  • GrowType - type of the growing strategy. The following types are supported:
    • Error - Error on buffer limit (default)
    • Skip - Skip on buffer limit
    • Grow - Grow on buffer limit
  • GrowLimit - Growing limit (default is 3000000)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderHotSwap under the spoiler:

Asynchronous log4net solution 3: .NET port of LMAX Disruptor

The third solution of asynchronous logging is based on LMAX Disruptor which was ported to the .NET Framework (see disruptor-net). For more information on LMAX Disruptor pattern you may refer to the Martin Fowler's article - The LMAX Architecture.

The implementation of the AsyncForwardingAppenderDisruptor is has the following configuration options:
  • ClaimStrategy - type of the claim strategy. The following types are supported:
    • SingleThreaded - Optimized strategy can be used when there is a single publisher thread claiming sequences. This strategy must not be used when multiple threads are used for publishing concurrently on the same Sequencer.
    • MultiThreaded - Strategy to be used when there are multiple publisher threads claiming sequences. This strategy is reasonably forgiving when the multiple publisher threads are highly contended or working in an environment where there is insufficient CPUs to handle multiple publisher threads. It requires 2 CAS operations for a single publisher, compared to the MultithreadedLowContention strategy which needs only a single CAS and a lazySet per publication. Default strategy.
    • MultiThreadedLowContention - Strategy to be used when there are multiple producer threads claiming sequences. This strategy requires sufficient cores to allow multiple publishers to be concurrently claiming sequences.
  • WaitStrategy - type of the wait strategy. The following types are supported:
    • Blocking - Blocking strategy that uses a lock and condition variable for waiting on a barrier. This strategy should be used when performance and low-latency are not as important as CPU resource.
    • BusySpin - Busy Spin strategy that uses a busy spin loop for waiting on a barrier. This strategy will use CPU resource to avoid syscalls which can introduce latency jitter. It is best used when threads can be bound to specific CPU cores.
    • Sleeping - Sleeping strategy that uses a System.Threading.SpinWait while the are waiting on a barrier. This strategy is a good compromise between performance and CPU resource. Latency spikes can occur after quiet periods.
    • Yielding - Yielding strategy that uses a Thread.Sleep(0) for waiting on a barrier after an initially spinning. This strategy is a good compromise between performance and CPU resource without incurring significant latency spikes. Default strategy.
  • BufferSize - Disruptor buffer size (default is 1024)
  • Fix - Fields that will be fixed in the event (default is FixFlags.Message | FixFlags.ThreadName | FixFlags.Exception)

There is an implementation of the AsyncForwardingAppenderDisruptor under the spoiler:

Updated log4net configuration

Under the spoiler you'll find an updated log4net configuration file with new async appenders:

Performance results of asynchronous log4net solutions with NullAppender


Performance results of asynchronous log4net solutions with RollingFileAppender


Conclusions

  • log4j2 still the fastest logging solution! But it's for Java platform...
  • AsyncForwardingAppenderBlocking solution based on BlockingCollection performs very bad, especially for several logging producers.
  • AsyncForwardingAppenderHotSwap solution based on Hot swap technique performs the best. However it might requires additional memory and even causes OutOfMemoryException when logging events are produced very fast!
  • AsyncForwardingAppenderDisruptor solution based on .NET port of LMAX Disruptor performs good, but it is still far from log4j2 even with NullAppender. It is a good point to investigate possible performance issues in disruptor-net or its wrong usage for the logging case.
  • RollingFileAppender which is responsible for file IO operations in log4net is much slower than log4j2 analog - RandomAccessFileAppender. It is also a good point of future investigation.

3 comments:

  1. Thanks for the analysis. There is a huge problem with the BlockingCollection solution that you have used. In the FlushThreadMethod() inside the for loop if you introduce an artificial delay it totally blocks addition of items to the _flushingCollection inside AppendInternal method. So any part of the application that tries to log items just keeps waiting until the for loop completes. I ended up using your hot swap solution.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Interesting analysis, here's another high performance implementation it would be interesting if you could add this to your analysis for comparison. [http://www.nimaara.com/2016/01/01/high-performance-logging-log4net/]

    ReplyDelete