Benchmarking echo - articles - clsr

Benchmarking echo

2014-11-14 05:02:54 UTC

On a certain technology-oriented discussion board, some people were arguing about the performance and implementation of the echo command. Specifically, someone was claiming that printing one character at a time (such as when using putchar()) would be horribly inefficient, as it would call the write() syscall for every character.

It would probably have been enough to point out that FILE buffers all writes by default, but why do that when one can actually benchmark it?

Start with a few implementations of echo (with implicit -e flag, since the echo in the discussion included that feature):

In the functions using FILE, fflush() was used after every iteration.

The results I got:

$ gcc -Wall -ansi -pedantic -Werror -pedantic-errors -g echo-bench.c -o echo-bench
$ ./echo-bench 'qwe\bas\\bd' '' 'zxc123'
echo_null     (1000000 runs): 61705us
echo_buf      (1000000 runs): 792972us
echo_inplace  (1000000 runs): 1634044us
echo_finplace (1000000 runs): 901472us
echo_putchar  (1000000 runs): 1072288us
echo_putcharu (1000000 runs): 801225us
echo_write    (1000000 runs): 8790156us

The deciding factor in here was the number of write() calls for each iteration. echo_buf was the fastest, since it only uses write() once and doesn’t have the overhead of locking/unlocking from FILE functions. echo_putcharu was second best and was very close to echo_buf across several runs of the benchmark, since it doesn’t do locking either. echo_finplace was a close second, since it also used only one write() call, as it did buffering internally. In this example, echo_putchar was actually quite fast, but still slower than all the others due to frequent locking. The difference between it (1.07s), fwrite (0.9s) and manual buffering or putchar_unlocked (0.8s) was quite negligible in comparison to actually using write() to write each character (8.8s).

Let’s do another try with just a single long argument (lower iteration count because it takes too long), so that the number of arguments doesn’t impact echo_inplace so hard:

$ gcc -Wall -ansi -pedantic -Werror -pedantic-errors -g echo-bench.c -o echo-bench -DNITER=2000
$ ./echo-bench "$(for i in {1..65536}; do echo -n \\\\; done)"
echo_null     (2000 runs): 17790us
echo_buf      (2000 runs): 765108us
echo_inplace  (2000 runs): 757790us
echo_finplace (2000 runs): 761154us
echo_putchar  (2000 runs): 1924856us
echo_putcharu (2000 runs): 1083687us
echo_write    (2000 runs): 33310249us

The results were similar this time. echo_inplace (0.75s) took the first place with echo_finplace (0.76s) taking a close second, despite BUFSIZ being only 8192. echo_putchar (1.9s) did a bit worse this time, taking more than two times the time most of the others did. echo_putcharu (1.1s) was also slower, but not so much that it’d be unviable. However, echo_write (33s) was horrible, being over 40 times as slow as the others and still over 16 times as slow as echo_putchar, which is a significantly worse ratio than in the previous example.

So, while putchar() is still somewhat slower than writing larger chunks due to locking (putchar_unlocked is comparable to writing the whole string at once), but it does buffering and will not syscall for every character. There’s no significant difference between manually buffering and using FILE’s builtin buffer otherwise, so just use whatever is simpler.