Introduction
Arc<Mutex<T>> is a common pattern for shared mutable state in Rust, but it serializes all access through a single lock. Under concurrent load, threads spend most of their time waiting for the lock rather than doing useful work. This contention bottleneck is invisible in low-load testing but becomes the dominant performance issue in production, often reducing throughput to near-single-threaded levels despite having many worker threads.
Symptoms
- CPU usage is low but throughput does not increase with more threads
tokio-consoleshows tasks blocked waiting for Mutexparking_lot::Mutexprofiling shows high lock contention percentage- Response time increases linearly with request count
perf lockshows threads spending 80%+ time waiting
Profile contention: ```rust use parking_lot::Mutex;
let counter = Arc::new(Mutex::new(0u64));
// In production, enable lock statistics #[cfg(feature = "lock_stats")] { use parking_lot::lock_stats; lock_stats::enable(); } ```
Common Causes
- Single
Arc<Mutex<T>>shared across all worker threads - Lock held for too long (expensive computation inside critical section)
- Nested locks creating lock ordering issues
- Using Mutex when atomic operations would suffice
- No read/write distinction (all access is exclusive)
Step-by-Step Fix
- 1.Replace Mutex with atomic types for simple counters:
- 2.```rust
- 3.use std::sync::atomic::{AtomicU64, Ordering};
// WRONG - Mutex for simple counter let counter: Arc<Mutex<u64>> = Arc::new(Mutex::new(0)); // Every increment requires locking
// CORRECT - atomic counter, no lock needed let counter: Arc<AtomicU64> = Arc::new(AtomicU64::new(0));
// Lock-free increment counter.fetch_add(1, Ordering::Relaxed);
// Lock-free read let value = counter.load(Ordering::Relaxed); ```
- 1.Use RwLock for read-heavy workloads:
- 2.```rust
- 3.use std::sync::Arc;
- 4.use tokio::sync::RwLock;
// When reads vastly outnumber writes let config: Arc<RwLock<AppConfig>> = Arc::new(RwLock::new(config));
// Multiple readers can access simultaneously let config_clone = config.clone(); let reader1 = tokio::spawn(async move { let r = config_clone.read().await; // Other readers can also access at the same time r.max_connections });
let config_clone = config.clone(); let reader2 = tokio::spawn(async move { let r = config_clone.read().await; r.timeout });
// Writer gets exclusive access let mut w = config.write().await; w.max_connections = 1000; // All readers blocked during write ```
- 1.Shard the Mutex to reduce contention:
- 2.```rust
- 3.use std::collections::HashMap;
- 4.use parking_lot::Mutex;
// WRONG - single mutex for entire cache let cache: Arc<Mutex<HashMap<String, String>>> = Arc::new(Mutex::new(HashMap::new()));
// CORRECT - shard into multiple mutexes const NUM_SHARDS: usize = 16;
struct ShardedCache { shards: Vec<Mutex<HashMap<String, String>>>, }
impl ShardedCache { fn new() -> Self { Self { shards: (0..NUM_SHARDS) .map(|_| Mutex::new(HashMap::new())) .collect(), } }
fn shard_for(&self, key: &str) -> &Mutex<HashMap<String, String>> { // Use hash to distribute keys across shards let hash = std::collections::hash_map::DefaultHasher::new(); // Simple hash to shard index let idx = key.bytes().fold(0usize, |acc, b| acc.wrapping_add(b as usize)) % NUM_SHARDS; &self.shards[idx] }
fn get(&self, key: &str) -> Option<String> { self.shard_for(key).lock().get(key).cloned() }
fn insert(&self, key: String, value: String) { self.shard_for(&key).lock().insert(key, value); } }
// 16x less contention with 16 shards let cache = Arc::new(ShardedCache::new()); ```
- 1.Minimize critical section duration:
- 2.```rust
- 3.use parking_lot::Mutex;
// WRONG - expensive work inside lock fn process_data(cache: &Arc<Mutex<Cache>>) { let mut guard = cache.lock(); let data = guard.get("key").unwrap(); let result = expensive_computation(&data); // Lock held during expensive work! guard.insert("result", result); }
// CORRECT - copy out, compute, copy back fn process_data_fixed(cache: &Arc<Mutex<Cache>>) { let data = { let guard = cache.lock(); guard.get("key").unwrap().clone() // Short lock, clone data }; // Lock released here
let result = expensive_computation(&data); // No lock held
{ let mut guard = cache.lock(); guard.insert("result", result); // Short lock for insert } } ```
- 1.Profile contention with tokio-console:
- 2.```toml
- 3.# Cargo.toml
- 4.[dependencies]
- 5.tokio = { version = "1", features = ["full", "tracing"] }
- 6.parking_lot = { version = "0.12", features = ["deadlock_detection"] }
- 7.
`
```rust // Check mutex contention programmatically fn check_contention() { let start = std::time::Instant::now(); let mut wait_times = Vec::new();
for _ in 0..1000 { let before = std::time::Instant::now(); let _guard = shared_mutex.lock(); wait_times.push(before.elapsed()); }
let avg_wait = wait_times.iter().sum::<std::time::Duration>() / wait_times.len() as u32; eprintln!("Average mutex wait: {:?}", avg_wait);
if avg_wait > std::time::Duration::from_micros(100) { eprintln!("WARNING: High mutex contention detected"); } } ```
Prevention
- Use atomics for simple counters and flags
- Use
RwLockwhen reads outnumber writes - Shard mutexes for high-contention shared maps
- Keep critical sections as short as possible
- Profile with
tokio-consoleandparking_lotdeadlock detection - Prefer message passing (channels) over shared state for complex coordination