Not every async is good

Recently I found code that looks like this:
 

        static async Task Main(string[] args)
        {
            const int MaxLines = 10000;
            string line;
            List<string> lines = new List<string>();
            Stopwatch stopwatch = Stopwatch.StartNew();

            using var reader = new StreamReader(args[0]);
            while ((line = await reader.ReadLineAsync()) != null)
            {
                if (lines.Count >= MaxLines)
                {
                    lines.Clear();
                }

                lines.Add(line);
            }

            Console.WriteLine(stopwatch.ElapsedMilliseconds);
        }

As you can see this code is written “by the book”, everything is async: Main function is async, reading from file is async, and looks like this code is good. And if you ask people around which version would be faster: this one or non-async you will get mixed result. Some will say that async version will be faster, some will say they are about the same, and some will say that async will be slightly slower due to async overhead.

So, let me do some science and run this code. I have 1 Gb text file on HDD just for tests like this. Hmm, 96 seconds. Well, maybe file was “cold”, let me run it again. 92 seconds. Well, it should be that pesky old and slow HDD. Let me move to one of the fastest SSD on market and run it again. 90 seconds. What? There is clearly something wrong here. Let me run non-async version: 2.4 seconds from HDD and 2.2 from SSD. No way! Clearly something wrong with my code.

Let me explain result, what is async and what it should be used for.

Firstly, async by itself never improve performance. Async allows utilize threads much better and allows to do the same with less threads. Thread itself is quite expensive object for operating system and there is hard limit of how many threads can exist. And most of the time these threads doing nothing. They are sleeping on some delay, waiting for some event etc. And clearly it would be nice to utilize them better. Specially it is important for servers. Most typical example is web server. Most typical operations on web server is receiving data, waiting for database server to response and sending data. As result in 99.9% of cases thread in non-async code is just doing nothing. And there is limit on how many threads web server will create to serve requests. After that new requests will wait in queue to be processed. And using async in web server will provide enormous benefits and allow to serve way more requests on the same hardware using there are no other bottlenecks. For example, database server could be next bottleneck.

Anyway, back to our example. What happens here, and why it is so slow in async version? Well it is not really async problem. Problem is in StreamReader. By default, it uses “generous” 1024 bytes buffer. As result as soon as code reads 1 Kb, it will go to OS to read next 1 Kb. Going to OS will create Task, creates continuation and all other “magic” async requires. Thread will go to sleep and when data will arrive, OS will wake thread and plan it to run on next quantum of time. As result there is huge overhead about every 1 Kb of data.

Changing one line of code to this:

using var reader = new StreamReader(args[0], Encoding.UTF8, true, 1024 * 1024 * 10);

will improve speed to 3.8 seconds on HDD and 3.6 seconds on SDD. Still 50% slower than sync version, but great improvement comparing to original version.

And I would like to state that it is not artificial example. I took this code from real projects and I have seen this problem few times already. Code looks perfect but works quite slow.

I’m not saying that I’m expert in this area, but I say that async is worth in case where scalability is really important. For example, if you are expecting data from database in 100 milliseconds, your code will be activated at about 110 milliseconds, but this time threads will process other requests. But if you code is reading from “hot” file and it takes 1 millisecond, probably async will not any good here. Overhead will be bigger than async benefits.

In desktop applications async is less necessary and often just adds overhead without any benefits like in my example code. Please understand technology and details of how it works before using it everywhere in code. Else you will be unpleasantly surprised.