Wednesday, January 29, 2014

.NET low latency logging. Part 2 - NLog performance

Add NLog to the sample application

Integrating with NLog is simple. Just add NLog package (current version is 2.1.0) with a NuGet to the project.

Add <nlog> section to the App.config file

Create a logger in code.

Performance hints

My experiments showed that the following options influence on NLog performance:
  • Null target is the fastest, especially with formatMessage="false" option
  • Null target with formatMessage="true" option highly depends on the layout property with ${message}, ${longdate}, ${threadid} placeholders
  • File target is much slower, but can be tuned with autoFlush="false" and keepFileOpen="true"options
  • AsyncWrapper target could help to increase latency and logging speed by performing IO operation in a separate thread. But you should be careful with its options:
    • overflowAction="Discard" (which is default) will skip some log messages when buffer overflows
    • overflowAction="Grow" could cause an OutOfMemoryException on huge batch of log messages
    • overflowAction="Block" with default settings could dramatically decrease performance of the main working thread and increase logging latency when buffer overflows. Performance could be improved by playing with queueLimit, batchSize and timeToSleepBetweenBatches options.

Performance of the Null target

Here comes test results of the Null target for NLog in three cases:
  • Without any format of the log message (the fastest case)
  • With simple format of the log message which contains only ${message} placeholder
  • With full format of the log message which contains the most important placeholders (${message}, ${longdate}, ${threadid}, etc)

Performance of the File target

Here comes test results of the File target for NLog with autoFlush="false" and keepFileOpen="true"options. You can see the very bad latency and overall performance caused by the sync IO operations during the logging.

Performance of the AsyncWrapper + File target

Here comes test results of the AsyncWrapper target for NLog with File target in three cases:
  • With overflowAction="Discard" option. Is not very interesting case because it is not good to lost some logging messages on a way.
  • With overflowAction="Grow" option. Less interesting case because OutOfMemoryException could be met very quickly.
  • With overflowAction="Block" with queueLimit="1000000", batchSize="1000" and timeToSleepBetweenBatches="0" options. The most interesting case!

Conclusions

  • NLog supports async logging and could be used for tasks were high throughput and low latency is important
  • Null target performance shows how fast NLog internals works
  • Formatting logging messages with ${message}, ${longdate}, ${threadid} placeholders slow down NLog performance
  • AsyncWrapper target performs 3 times faster than sync version of the File target
  • AsyncWrapper target throughput 10 times greater than sync version of the File target
  • AsyncWrapper target with overflowAction="Block", queueLimit="1000000", batchSize="1000" and timeToSleepBetweenBatches="0" options shows very good latency, time and throughput results!

3 comments:

  1. Setting timeToSleepBetweenBatches to "0" cause the CPU to be always active and therefore it should be avoided. Here's the proof https://github.com/NLog/NLog/issues/1069

    ReplyDelete
  2. Have tried your performance test with NLog 4.3.11 and get 500.000 msgs/sec when using your async-config (timeToSleepBetweenBatches = 0 is now possible without heavy CPU usage). When having 8 threads, then it gives 250.000 msgs/sec.

    There is a concurrency-bottleneck in NLog where the target is responsible for formatting the log-message. Thus multiple logger-threads writing to the same target are competing.

    So without any special appenders/targets then NLog is now faster than log4net, when 4 threads or less are writing to the same target.

    What is more important when writing low-latency applications is the amount of garbage produced by the logging-framework. Could be nice to see how many garbage-collections (GC0, GC1, GC2) that each performance test produce.

    It could also be nice if your performance test code had a warmup operation before starting to do the actual measurement.

    ReplyDelete