un — MOAD-0005: A Thundering Herd

un

guest

1 / ?

back to lessons

The Four-Step Pattern

The defect lives in four sequential, non-atomic steps:

// DEFECT
Value value = cache.get(key);
if (value == null) {
    value = expensiveCompute(key);
    cache.put(key, value);
}
return value;

Step 1: check cache. Step 2: miss. Step 3: compute. Step 4: store. All four steps: non-atomic. Between step 1 & step 4, any number of threads can execute step 1 & all see null.

The Idempotency Trap

The reasoning that protects this pattern: 'it is OK if two threads compute & store the same value. The result is idempotent. No data corruption occurs.'

This reasoning: correct about correctness. Fatal about cost.

At 1,000 threads on a cache miss: 1,000 threads each execute expensiveCompute(key). If expensiveCompute queries a database, 1,000 database queries fire simultaneously. If it calls an external service, 1,000 HTTP requests fire simultaneously. The system producing correct results collapses under the cost of producing them.

Three Triggers

A Thundering Herd fires when a cache key transitions from warm to cold simultaneously across many threads:

Cold start: service restarts with an empty cache. First request wave: every key misses. All compute simultaneously.

Service restart: rolling restart resets cache across instances. Traffic redistributes to cold instances.

TTL expiry: a high-traffic key expires. N threads all check, all miss, all compute before the first thread stores the result.

All three triggers: correlated with traffic spikes. The herd fires when traffic peaks & the cache is cold. Worst possible time.

The Elasticsearch EnrichCache Example

Elasticsearch EnrichCache: documented comment reads 'intentionally non-locking for simplicity...OK if we re-put the same key/value in a race condition.' At 10,000 documents per second with a cold enrich cache: all 10,000 requests hit the enrich index simultaneously. The enrich index, designed for occasional lookups, faces 10,000 concurrent queries. Cluster destabilizes.

The idempotency reasoning: correct in the code comment. Catastrophic at 10,000 documents per second.

Coupling to MOAD-0001

MOAD-0001 (Sedimentary Defect) creates an O(N²) bottleneck in high-throughput systems. Fixing MOAD-0001 (O(N²) to O(N)) unblocks that workstation. The faster throughput sends more requests downstream. Downstream caches, previously protected by the MOAD-0001 bottleneck, now receive correlated traffic spikes. MOAD-0005 fires in caches that never triggered it before. Fix one MOAD; stage the other.

What the Idempotency Trap Gets Wrong

The Elasticsearch comment represents careful engineering reasoning applied to the wrong question. Idempotency: a real property worth reasoning about. The trap: stopping the analysis at correctness without continuing to cost.

Why does the reasoning 'it is OK if two threads write the same result' lead to the defect? What does it get right, and what does it miss?

computeIfAbsent & singleflight

The fix: make the check-and-compute atomic. One thread computes. All other threads wait for that result.

Java: computeIfAbsent

// DEFECT: four non-atomic steps
Value value = cache.get(key);
if (value == null) {
    value = expensiveCompute(key);
    cache.put(key, value);
}
return value;

// FIX: atomic check-and-compute
return cache.computeIfAbsent(key, k -> expensiveCompute(k));

computeIfAbsent: if key absent, computes exactly once, stores, returns. All other threads calling computeIfAbsent for the same key wait for the first computation. No N-fold compute. No stampede.

Go: singleflight.Group

var g singleflight.Group

func getOrCompute(key string) (Value, error) {
    v, err, _ := g.Do(key, func() (interface{}, error) {
        return expensiveCompute(key)
    })
    return v.(Value), err
}

singleflight: if a computation for key already runs, all callers for the same key wait & share the single result. One compute, N waiters, one result shared. The 'flight' abstraction: deduplicate in-flight requests.

Lock vs singleflight

A naive per-key lock serializes: thread 1 computes, thread 2 waits, thread 3 waits. After thread 1 finishes, thread 2 enters & checks cache (hits). Thread 3 enters & checks cache (hits). N-1 lock acquisitions & cache reads.

singleflight deduplicates: thread 1 computes, threads 2 through N all wait on thread 1's result. No separate lock acquisitions. No separate cache reads. One computation, one result, distributed to N waiters. Fewer operations than a per-key lock.

Both prevent the stampede. singleflight prevents redundant work more completely.

Rewrite the Pattern

Apply the fix to a concrete scenario.

// A user profile cache in a high-traffic Java service
public UserProfile getProfile(String userId) {
    UserProfile profile = profileCache.get(userId);
    if (profile == null) {
        profile = database.loadProfile(userId);  // expensive: 50ms DB query
        profileCache.put(userId, profile);
    }
    return profile;
}

Service restarts every morning at 2 AM. At 8 AM, 10,000 users request their profiles simultaneously.

Identify the defect, name when it fires, and rewrite using computeIfAbsent.