Thursday, January 30, 2014

.NET low latency logging. Part 3 - log4net performance

Add log4net to the sample application

It is also very simple to integrate log4net into the sample application. Just add log4net package (current version is 1.2.13) with a NuGet to the project.

Add <log4net> section to the App.config file

Create a logger in code.

NullAppender

Note that log4net doesn't have Null appender out from the box. But it is not difficult to create it by inheriting from AppenderSkeleton with an empty void Append(LoggingEvent loggingEvent) override method:

Performance hints

My experiments showed the following key options that influence on log4net performance out of the box:
  • NullAppender that was implemented above is the fastest appender. It shows the general performance of logging internals and can be used to compare log4net with other similar components (e.g. NLog).
  • FileAppender is used to store logging messages to the specified file, but performs much slower than NullAppender
  • RollingFileAppender is usefull to separate logging messages by date, time or file size attribute, but performs a little bit slower than FileAppender

Performance of the NullAppender


Performance of the FileAppender and RollingFileAppender


Conclusions


Wednesday, January 29, 2014

.NET low latency logging. Part 2 - NLog performance

Add NLog to the sample application

Integrating with NLog is simple. Just add NLog package (current version is 2.1.0) with a NuGet to the project.

Add <nlog> section to the App.config file

Create a logger in code.

Performance hints

My experiments showed that the following options influence on NLog performance:
  • Null target is the fastest, especially with formatMessage="false" option
  • Null target with formatMessage="true" option highly depends on the layout property with ${message}, ${longdate}, ${threadid} placeholders
  • File target is much slower, but can be tuned with autoFlush="false" and keepFileOpen="true"options
  • AsyncWrapper target could help to increase latency and logging speed by performing IO operation in a separate thread. But you should be careful with its options:
    • overflowAction="Discard" (which is default) will skip some log messages when buffer overflows
    • overflowAction="Grow" could cause an OutOfMemoryException on huge batch of log messages
    • overflowAction="Block" with default settings could dramatically decrease performance of the main working thread and increase logging latency when buffer overflows. Performance could be improved by playing with queueLimit, batchSize and timeToSleepBetweenBatches options.

Performance of the Null target

Here comes test results of the Null target for NLog in three cases:
  • Without any format of the log message (the fastest case)
  • With simple format of the log message which contains only ${message} placeholder
  • With full format of the log message which contains the most important placeholders (${message}, ${longdate}, ${threadid}, etc)

Performance of the File target

Here comes test results of the File target for NLog with autoFlush="false" and keepFileOpen="true"options. You can see the very bad latency and overall performance caused by the sync IO operations during the logging.

Performance of the AsyncWrapper + File target

Here comes test results of the AsyncWrapper target for NLog with File target in three cases:
  • With overflowAction="Discard" option. Is not very interesting case because it is not good to lost some logging messages on a way.
  • With overflowAction="Grow" option. Less interesting case because OutOfMemoryException could be met very quickly.
  • With overflowAction="Block" with queueLimit="1000000", batchSize="1000" and timeToSleepBetweenBatches="0" options. The most interesting case!

Conclusions

  • NLog supports async logging and could be used for tasks were high throughput and low latency is important
  • Null target performance shows how fast NLog internals works
  • Formatting logging messages with ${message}, ${longdate}, ${threadid} placeholders slow down NLog performance
  • AsyncWrapper target performs 3 times faster than sync version of the File target
  • AsyncWrapper target throughput 10 times greater than sync version of the File target
  • AsyncWrapper target with overflowAction="Block", queueLimit="1000000", batchSize="1000" and timeToSleepBetweenBatches="0" options shows very good latency, time and throughput results!

Tuesday, January 28, 2014

.NET low latency logging. Part 1 - Testing environment, Sample application, Best performance!

Choose the solution for low latency logging with the best performance in .NET Framework

Well known that every complex software service requires logging component to store some significant or debug events during lifetime especially on production environment. If the service is close to real-time and handle dozens of request/response messages in a moment then wasting expensive time on log IO operation is not a rational. In this article I'll try to investigate which log components could be used for .NET Framework based solution, how they performs and how the logging can be improved to achieve the best results.

Testing environment

All tests are built and run on my local PC with the following configuration: Intel Core i5-2500 (3.3Ghz), 16GB RAM, HDD WD 160GB, Windows 7 x64

Sample application

There is a sample application under the spoiler which was used for all tests. The idea is to create several producers (default is 1) that will do logging in different threads. Count of items to be logged is also predefined (default is 10000000) and equally divided by all producers. We'll track logging latency using Stopwatch class with making sure that its Stopwatch.IsHighResolution filed is 'true'. The only place that is changed for different logging components is a line 55 which performs log operation in the current producer.

No logging? Best performance!

Lets start from checking the prime performance of generating 10000000 items by 1, 2, 4, 8, 16, 32 and 64 producers (threads) simultaneously without making any logging. We'll take this performance as an origin. For test results I took the best of 10 independent launches.

Conclusions

  • 90% items were produced with latency less than 31 nanoseconds
  • 10% items were produced with latency 300 nanoseconds
  • Producing time and throughput results are better for cases when the count of produce threads is equal to the count of CPU cores

Now we have ideal results to compare with future investigations!