Meet Kourier
The Fastest Server Ever
12.1M RPS on an AMD Ryzen 5 1600, 52x Faster than Go, and a fully-compliant parser
Read On Our Blog
Why No One Did This Before?
Developing a server that delivers unprecedented performance is a complex and challenging task. Many different aspects must be addressed to achieve stellar performance. Let's explore some key features that help Kourier to provide its never-before-seen performance.
State Of The Art HTTP Parser
The HTTP syntax rules are simple to enforce when the parser is a state machine that works byte-by-byte. To write a faster parser, we have to use SIMD instructions. However, enforcing HTTP syntax rules becomes considerably more complex with SIMD instructions. That's why many parsers deliberately loosen HTTP syntax rules to employ them, as I show on the blog.

Kourier uses SIMD instructions extensively on its parser while maintaining strict adherence to HTTP syntax rules.

Kourier's parser is a performance powerhouse, capable of processing 12.1 million unencrypted HTTP requests per second on an AMD Ryzen 5 1600, an 8-year-old mid-range processor, using only half of its cores (wrk uses the other half). It sets a new standard for HTTP parsing and leaves the highest-performing enterprise network appliances in the dust.

Kourier is not just the fastest server on Earth; it's also open source, ensuring that all claims about its performance are verifiable. Its source code is available on GitHub, allowing anyone to see how HTTP parsers should always have been.
Unbeatable TLS Performance
Kourier uses OpenSSL for TLS encryption. Although OpenSSL is battle-tested, it is challenging to integrate it appropriately. Almost all users of OpenSSL do the naive approach of employing file descriptor-based BIOs, which notoriously consume vast amounts of memory, besides being slow.

Kourier's implementation excels at TLS performance because it provides custom memory BIOs to OpenSSL to restrict it to TLS computations while keeping all other responsibilities under Kourier's control.
Cutting-Edge Signal-Slot Implementation
The awesome Qt Framework popularized signals and slots, one of Qt's main features. Signals and slots promote loosely coupled designs that propagate events through signals. Signals and slots unify frontend and backend programming into a single paradigm, as events can represent either user interaction or incoming/outgoing network data.

Kourier implements a modern signals and slots mechanism built upon C++'s powerful template metaprogramming capabilities. It is an order of magnitude faster than Qt and consumes 4x less memory.
Lightweight Timers
Writing a reliable server requires timers. Malicious users intentionally send data slowly on multiple connections to attack a server. Timers help to prevent that abuse from happening. However, userspace timer implementations are generally not designed with the requirements of high-performance servers as a use case.

For example, every time a server starts to process a request, it resets a request timer, and whenever the server responds to a request, it starts an idle timer. Typical timer implementations are not optimized for this use case of multiple resets without timeouts. That's why some frameworks use deadlines instead of real timeout-based timers to achieve better benchmark results.

However, deadlines can only prevent slow senders. They are useless against rogue senders that keep the connection open indefinitely after sending only part of a request.

Kourier provides a timer implementation that allows timers to be reset millions of times per second without incurring system calls or memory allocations. Kourier's implementation provides timeout-based timers and can be viewed as an ultra-lightweight version of Qt's coarse timers.
Correct Use Of Epoll
Kourier also provides one of the best implementations for using epoll, a high-performance Linux IO event notification interface, to monitor file descriptors.

If you look at epoll's source code, you will learn that it only keeps O(1) computational complexity if used in edge-triggered mode. In level-triggered mode, file descriptors added to epoll's ready list never leave it.

Implementing a server with unprecedented performance requires sharp attention to detail. How the system interacts with the low-level IO readiness model provided by the Kernel is crucial for its performance and reliability.

Kourier integrates epoll into Qt's event system and implements socket classes that use the signals and slots mechanism to abstract epoll-based IO operations while providing all the niceties of having a Qt event loop running on worker threads.

Kourier exports TcpSocket and TlsSocket classes, which you can use instead of Qt's socket classes. Both are much faster and more lightweight than their Qt-based counterparts.
AGPL Only? My Business Is Not Compatible With It!
You can contact me if your Business wants to use Kourier under an alternative license.

It is not a problem if your network appliances run on a BSD-derived OS. The IO readiness models provided by epoll and kqueue are similar, and it is not too much effort to make Kourier work with both.