Introduction

Arc<Mutex<T>> is a common pattern for shared mutable state in Rust, but it serializes all access through a single lock. Under concurrent load, threads spend most of their time waiting for the lock rather than doing useful work. This contention bottleneck is invisible in low-load testing but becomes the dominant performance issue in production, often reducing throughput to near-single-threaded levels despite having many worker threads.

Symptoms

  • CPU usage is low but throughput does not increase with more threads
  • tokio-console shows tasks blocked waiting for Mutex
  • parking_lot::Mutex profiling shows high lock contention percentage
  • Response time increases linearly with request count
  • perf lock shows threads spending 80%+ time waiting

Profile contention: ```rust use parking_lot::Mutex;

let counter = Arc::new(Mutex::new(0u64));

// In production, enable lock statistics #[cfg(feature = "lock_stats")] { use parking_lot::lock_stats; lock_stats::enable(); } ```

Common Causes

  • Single Arc<Mutex<T>> shared across all worker threads
  • Lock held for too long (expensive computation inside critical section)
  • Nested locks creating lock ordering issues
  • Using Mutex when atomic operations would suffice
  • No read/write distinction (all access is exclusive)

Step-by-Step Fix

  1. 1.Replace Mutex with atomic types for simple counters:
  2. 2.```rust
  3. 3.use std::sync::atomic::{AtomicU64, Ordering};

// WRONG - Mutex for simple counter let counter: Arc<Mutex<u64>> = Arc::new(Mutex::new(0)); // Every increment requires locking

// CORRECT - atomic counter, no lock needed let counter: Arc<AtomicU64> = Arc::new(AtomicU64::new(0));

// Lock-free increment counter.fetch_add(1, Ordering::Relaxed);

// Lock-free read let value = counter.load(Ordering::Relaxed); ```

  1. 1.Use RwLock for read-heavy workloads:
  2. 2.```rust
  3. 3.use std::sync::Arc;
  4. 4.use tokio::sync::RwLock;

// When reads vastly outnumber writes let config: Arc<RwLock<AppConfig>> = Arc::new(RwLock::new(config));

// Multiple readers can access simultaneously let config_clone = config.clone(); let reader1 = tokio::spawn(async move { let r = config_clone.read().await; // Other readers can also access at the same time r.max_connections });

let config_clone = config.clone(); let reader2 = tokio::spawn(async move { let r = config_clone.read().await; r.timeout });

// Writer gets exclusive access let mut w = config.write().await; w.max_connections = 1000; // All readers blocked during write ```

  1. 1.Shard the Mutex to reduce contention:
  2. 2.```rust
  3. 3.use std::collections::HashMap;
  4. 4.use parking_lot::Mutex;

// WRONG - single mutex for entire cache let cache: Arc<Mutex<HashMap<String, String>>> = Arc::new(Mutex::new(HashMap::new()));

// CORRECT - shard into multiple mutexes const NUM_SHARDS: usize = 16;

struct ShardedCache { shards: Vec<Mutex<HashMap<String, String>>>, }

impl ShardedCache { fn new() -> Self { Self { shards: (0..NUM_SHARDS) .map(|_| Mutex::new(HashMap::new())) .collect(), } }

fn shard_for(&self, key: &str) -> &Mutex<HashMap<String, String>> { // Use hash to distribute keys across shards let hash = std::collections::hash_map::DefaultHasher::new(); // Simple hash to shard index let idx = key.bytes().fold(0usize, |acc, b| acc.wrapping_add(b as usize)) % NUM_SHARDS; &self.shards[idx] }

fn get(&self, key: &str) -> Option<String> { self.shard_for(key).lock().get(key).cloned() }

fn insert(&self, key: String, value: String) { self.shard_for(&key).lock().insert(key, value); } }

// 16x less contention with 16 shards let cache = Arc::new(ShardedCache::new()); ```

  1. 1.Minimize critical section duration:
  2. 2.```rust
  3. 3.use parking_lot::Mutex;

// WRONG - expensive work inside lock fn process_data(cache: &Arc<Mutex<Cache>>) { let mut guard = cache.lock(); let data = guard.get("key").unwrap(); let result = expensive_computation(&data); // Lock held during expensive work! guard.insert("result", result); }

// CORRECT - copy out, compute, copy back fn process_data_fixed(cache: &Arc<Mutex<Cache>>) { let data = { let guard = cache.lock(); guard.get("key").unwrap().clone() // Short lock, clone data }; // Lock released here

let result = expensive_computation(&data); // No lock held

{ let mut guard = cache.lock(); guard.insert("result", result); // Short lock for insert } } ```

  1. 1.Profile contention with tokio-console:
  2. 2.```toml
  3. 3.# Cargo.toml
  4. 4.[dependencies]
  5. 5.tokio = { version = "1", features = ["full", "tracing"] }
  6. 6.parking_lot = { version = "0.12", features = ["deadlock_detection"] }
  7. 7.`

```rust // Check mutex contention programmatically fn check_contention() { let start = std::time::Instant::now(); let mut wait_times = Vec::new();

for _ in 0..1000 { let before = std::time::Instant::now(); let _guard = shared_mutex.lock(); wait_times.push(before.elapsed()); }

let avg_wait = wait_times.iter().sum::<std::time::Duration>() / wait_times.len() as u32; eprintln!("Average mutex wait: {:?}", avg_wait);

if avg_wait > std::time::Duration::from_micros(100) { eprintln!("WARNING: High mutex contention detected"); } } ```

Prevention

  • Use atomics for simple counters and flags
  • Use RwLock when reads outnumber writes
  • Shard mutexes for high-contention shared maps
  • Keep critical sections as short as possible
  • Profile with tokio-console and parking_lot deadlock detection
  • Prefer message passing (channels) over shared state for complex coordination