The Four-Step Pattern
The defect lives in four sequential, non-atomic steps:
// DEFECT
Value value = cache.get(key);
if (value == null) {
value = expensiveCompute(key);
cache.put(key, value);
}
return value;
Step 1: check cache. Step 2: miss. Step 3: compute. Step 4: store. All four steps: non-atomic. Between step 1 & step 4, any number of threads can execute step 1 & all see null.
The Idempotency Trap
The reasoning that protects this pattern: 'it is OK if two threads compute & store the same value. The result is idempotent. No data corruption occurs.'
This reasoning: correct about correctness. Fatal about cost.
At 1,000 threads on a cache miss: 1,000 threads each execute expensiveCompute(key). If expensiveCompute queries a database, 1,000 database queries fire simultaneously. If it calls an external service, 1,000 HTTP requests fire simultaneously. The system producing correct results collapses under the cost of producing them.
Three Triggers
A Thundering Herd fires when a cache key transitions from warm to cold simultaneously across many threads:
Cold start: service restarts with an empty cache. First request wave: every key misses. All compute simultaneously.
Service restart: rolling restart resets cache across instances. Traffic redistributes to cold instances.
TTL expiry: a high-traffic key expires. N threads all check, all miss, all compute before the first thread stores the result.
All three triggers: correlated with traffic spikes. The herd fires when traffic peaks & the cache is cold. Worst possible time.
The Elasticsearch EnrichCache Example
Elasticsearch EnrichCache: documented comment reads 'intentionally non-locking for simplicity...OK if we re-put the same key/value in a race condition.' At 10,000 documents per second with a cold enrich cache: all 10,000 requests hit the enrich index simultaneously. The enrich index, designed for occasional lookups, faces 10,000 concurrent queries. Cluster destabilizes.
The idempotency reasoning: correct in the code comment. Catastrophic at 10,000 documents per second.
Coupling to MOAD-0001
MOAD-0001 (Sedimentary Defect) creates an O(N²) bottleneck in high-throughput systems. Fixing MOAD-0001 (O(N²) to O(N)) unblocks that workstation. The faster throughput sends more requests downstream. Downstream caches, previously protected by the MOAD-0001 bottleneck, now receive correlated traffic spikes. MOAD-0005 fires in caches that never triggered it before. Fix one MOAD; stage the other.
What the Idempotency Trap Gets Wrong
The Elasticsearch comment represents careful engineering reasoning applied to the wrong question. Idempotency: a real property worth reasoning about. The trap: stopping the analysis at correctness without continuing to cost.
computeIfAbsent & singleflight
The fix: make the check-and-compute atomic. One thread computes. All other threads wait for that result.
Java: computeIfAbsent
// DEFECT: four non-atomic steps
Value value = cache.get(key);
if (value == null) {
value = expensiveCompute(key);
cache.put(key, value);
}
return value;
// FIX: atomic check-and-compute
return cache.computeIfAbsent(key, k -> expensiveCompute(k));
computeIfAbsent: if key absent, computes exactly once, stores, returns. All other threads calling computeIfAbsent for the same key wait for the first computation. No N-fold compute. No stampede.
Go: singleflight.Group
var g singleflight.Group
func getOrCompute(key string) (Value, error) {
v, err, _ := g.Do(key, func() (interface{}, error) {
return expensiveCompute(key)
})
return v.(Value), err
}
singleflight: if a computation for key already runs, all callers for the same key wait & share the single result. One compute, N waiters, one result shared. The 'flight' abstraction: deduplicate in-flight requests.
Lock vs singleflight
A naive per-key lock serializes: thread 1 computes, thread 2 waits, thread 3 waits. After thread 1 finishes, thread 2 enters & checks cache (hits). Thread 3 enters & checks cache (hits). N-1 lock acquisitions & cache reads.
singleflight deduplicates: thread 1 computes, threads 2 through N all wait on thread 1's result. No separate lock acquisitions. No separate cache reads. One computation, one result, distributed to N waiters. Fewer operations than a per-key lock.
Both prevent the stampede. singleflight prevents redundant work more completely.
Rewrite the Pattern
Apply the fix to a concrete scenario.
// A user profile cache in a high-traffic Java service
public UserProfile getProfile(String userId) {
UserProfile profile = profileCache.get(userId);
if (profile == null) {
profile = database.loadProfile(userId); // expensive: 50ms DB query
profileCache.put(userId, profile);
}
return profile;
}
Service restarts every morning at 2 AM. At 8 AM, 10,000 users request their profiles simultaneously.