Multithreading: Thundering herd, Cache stampede and Lock convoy problems

Tech Explore
Dec 20, 2020
5 min read

Updated: Feb 20, 2021

(Image Credit - Pinterest)

First of all let's get some basics established.

What is Concurrency?

Carrying out more than one tasks at a time in a pseudo-parallel fashion. Concurrency is when multiple sequences of operations are run in overlapping periods of time.

Refer to the link for further details.

What is Multithreading?

According to Wikipedia,

Multithreading is the ability of a CPU (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system.

Refer to the link for further details.

Thundering herd problem

Consider a scenario where multiple threads/processes (with same scheduling priority) waiting for a particular lock Or an I/O event.

Once the lock is released (on a shared object) Or the I/O event is completed, all the waiting threads (blocked threads) will be notified and brought back to runnable state.

Even though a large number of threads are awoken only one thread out of that will be able to handle the event Or acquire the lock.

When the threads wake up, they will each try to handle the event, but only one will win. All threads will compete for resources, possibly freezing the computer, until the herd is calmed down again.

The thundering herd problem occurs when a large number of threads are awoken by a single lock release Or I/O completion event. This hinders the performance of the system.

Most Unix/Linux kernels overcome the thundering herd problem by serializing the response to accept, in other words, only one thread is waken up if more than one are blocking on accept against a single open file descriptor Or a lock.

In Linux based kernels this is achieved by the EPOLLEXCLUSIVE flag.

"EPOLLEXCLUSIVE" flag exclusively wakes up only one thread when more than one thread are blocking on accept. "EPOLLROUNDROBIN" flag is used in conjunction with "EPOLLEXCLUSIVE" flag to evenly distribute the the wakeups.

Refer to the link for further details.

Adding to the above definition, the following scenarios can also be considered as variants of thunder herd problem,

In a client-server scenario, a burst of requests hit the server overwhelming the number of available threads to process the requests (Or overwhelming the computation power of the system)
In a proxy server - legacy backend scenario, when the backend is down and multiple worker threads in the proxy server retrying to get connection to the backend overwhelming the computation resources of the system.
Cache stampede problem, consider a web server with response caching, once the cached page expires multiple concurrent threads will try to fetch the data and recompute the cache value which will cause the computation resources to be overwhelmed.

Refer to the link for further details.

Mitigating thundering herd problem

Depending on the nature of the thundering herd problem there are several techniques that can be used to mitigate.

OS level herding when multiple threads awoken by a single lock release Or I/O completion event.

Mitigation technique,

Introduction of specialised flags to serialise the response to accept lock release and I/O completion events.

Client - Server scenario herding when number of requests overwhelm the servers processing power.

Mitigation technique,

Introduce rate limiting, throttling and burst control mechanisms to prevent the thundering herd problem.
Introduce a cache layer to cache frequently served content in-order to reduce the hits on the server.

Proxy server - backend scenario herding when multiple threads consistently retrying to connect to a unavailable backend,

Mitigation technique,

Implement exponential backoff algorithms for clients to retry after a specified amount of time.
Introduce jitter to break the synchronisation across the clients to avoid collisions.
Introduce proxy cache to cache frequently served content in-order to reduce the hits on the backend thus avoiding the herding.

Cache stampede problem

Cache stampede Or Cache Missing Storm is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under a very high load. This behaviour is sometimes also called dog-piling.

It is a variant of thundering herd problem which occurs to caches in massively concurred systems.

In the above section we discussed of introducing a cache layer as an option to mitigate thundering herd problem in client-server systems. But caches are also susceptible to herding when it comes to high levels of concurrent loads.

Consider a client-server scenario with a cache layer, under very heavy load, when the cached version of that page expires multiple threads of execution will all attempt to render the content of that page simultaneously.

Multiple threads trying to re-render and re-cache the same page at the same time will exhaust shared resources. As none of the concurrent threads know other threads are also trying to re-render the same page.

In cases when the load is very high, the congestion for the shared resources can even lead to a situation where it results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. This can turn into a cascading failure and result in zero cache hit rate and ever congested system.

Refer to the link for further details.

Cache stampede mitigation

There are 3 main category of approaches proposed to mitigate cache stampede.

Locking based on Cache key
External re-computation
Probabilistic early expiration

Locking based on Cache key

To prevent multiple simultaneous recomputations of the same value, upon a cache miss a process will attempt to acquire the lock for that cache key and recompute it only if it acquires it.

External re-computation

In this approach computation of the cache value is moved to an independent external process from the process which needs the cache value.

The recomputation of the external process can be triggered in different ways:

When the cache value approaches its expiration
Periodically
When a process needing the value encounters a cache miss

Probabilistic early expiration

In this approach each process recomputes the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early re-computation increases as we get closer to the expiration of the value.

Since the probabilistic decision is made independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time.

Apart from these main 3 approaches having a multi-level cache layer (L1, L2 and L3) can also help mitigating the cache stampede as it reduces the probability of cache misses.

Lock convoy problem

Similar to the thundering herd problem lock convoy problem is also performance problem which occurs when multiple threads of equal priority contend repeatedly for the same lock.

Lock convoys create performance impact due to the repeated context switching and underutilisation of scheduling quanta. Each time a thread tries to acquire the lock and fails the thread will be pre-empted (pre-emptive scheduling) by the CPU and moved to waiting state and the context will be switched to a new thread.

When multiple threads repeatedly context switched upon failing to acquire the lock it creates performance overhead. Unlike deadlock and livelock situations, the threads in a lock convoy do make progress.

Refer to the link for further details.

Mitigating lock convey problem

Main reason for the lock convey problem is the locks which are used to serialise the access to a commonly used resource, such as a memory heap or a thread pool.

There are few techniques we can use to mitigate the lock convoys,

Use non-preemptive scheduling (in-contrast to pre-emptive scheduling. Non-preemptive scheduling prevents repeated context switching to new thread until the thread completes task.
Alternating the relative priorities of the contending threads. Since threads with the same level of priority contend over the same lock, altering their priority can prevent the contention and in-turn the lock convoys.
Implement the logic using non-locking alternatives such as lock-free algorithms.