What's Actually Happening

ClickHouse queries fail due to exceeding memory limits. Queries are terminated with memory allocation errors.

The Error You'll See

```sql SELECT * FROM large_table GROUP BY user_id;

Error: Memory limit (total) exceeded: would use 12.0 GiB (attempt to allocate chunk of 42.00 MiB), maximum: 10.0 GiB ```

Memory tracking error:

sql
Error: Memory limit (for query) exceeded: would use 2.0 GiB, maximum: 1.0 GiB

OOM during aggregation:

sql
Error: Allocator: Cannot mmap 128.00 MiB., errno: 12

Query killed:

sql
Error: Query was cancelled

Why This Happens

  1. 1.Large GROUP BY - Many unique keys require memory
  2. 2.JOIN without optimization - Large right table in memory
  3. 3.Sorting - ORDER BY on large result set
  4. 4.Window functions - Memory-intensive operations
  5. 5.Subqueries - Intermediate results in memory
  6. 6.Memory limit too low - Insufficient allocation

Step 1: Check Current Memory Limits

```sql -- Check server memory settings: SELECT * FROM system.settings WHERE name LIKE '%memory%' OR name LIKE '%max_bytes%';

-- Key settings: -- max_memory_usage - Per query limit (default 10GB) -- max_memory_usage_for_all_queries - Total limit -- max_bytes_before_external_group_by -- max_bytes_before_external_sort

-- Check current memory usage: SELECT metric, value FROM system.metrics WHERE metric LIKE '%Memory%';

-- Check query memory consumption: SELECT query, memory_usage, read_rows, read_bytes FROM system.query_log WHERE type = 'QueryFinish' ORDER BY memory_usage DESC LIMIT 10;

-- Check current queries: SELECT query_id, query, memory_usage, elapsed FROM system.processes ORDER BY memory_usage DESC; ```

Step 2: Adjust Memory Limits

```sql -- Increase per-query memory limit: SET max_memory_usage = 20000000000; -- 20GB

-- For current session: SET max_memory_usage = 30000000000; -- 30GB

-- In config file (/etc/clickhouse-server/config.xml): <profiles> <default> <max_memory_usage>20000000000</max_memory_usage> <max_memory_usage_for_all_queries>50000000000</max_memory_usage_for_all_queries> </default> </profiles>

-- Set in query: SELECT * FROM large_table GROUP BY user_id SETTINGS max_memory_usage = 20000000000;

-- Unlimited (dangerous): SET max_memory_usage = 0;

-- Check setting applied: SELECT name, value FROM system.settings WHERE name = 'max_memory_usage'; ```

Step 3: Enable External Group By

```sql -- Use external GROUP BY to spill to disk: SET max_bytes_before_external_group_by = 10000000000; -- 10GB SET max_memory_usage = 15000000000; -- 15GB for spill buffer

-- Run query with external GROUP BY: SELECT user_id, COUNT(*) as cnt FROM large_table GROUP BY user_id SETTINGS max_bytes_before_external_group_by = 10000000000;

-- External GROUP BY settings: SET distributed_aggregation_memory_efficient = 1; SET aggregation_memory_efficient_merge_threads = 4;

-- Check temp directory for spill: SELECT * FROM system.settings WHERE name = 'tmp_path';

-- Default: /var/lib/clickhouse/tmp/ -- Ensure enough disk space for spills ```

Step 4: Optimize GROUP BY

```sql -- Reduce GROUP BY keys: -- BAD: SELECT user_id, session_id, event_id, COUNT(*) FROM events GROUP BY user_id, session_id, event_id;

-- GOOD: Pre-aggregate SELECT user_id, COUNT(*) as events FROM ( SELECT user_id, session_id, COUNT(*) as session_events FROM events GROUP BY user_id, session_id SETTINGS max_bytes_before_external_group_by = 10000000000 ) GROUP BY user_id;

-- Use -State aggregate functions for incremental: SELECT user_id, uniqState(event_id) as events_state FROM events GROUP BY user_id;

-- Then merge: SELECT user_id, uniqMerge(events_state) FROM state_table GROUP BY user_id;

-- Use approximate aggregation: SELECT user_id, uniqExact(event_id) as exact_count, -- Exact, memory intensive uniq(event_id) as approx_count -- Approximate, less memory FROM events GROUP BY user_id; ```

Step 5: Optimize JOINs

```sql -- ClickHouse loads right table into memory: -- Keep right table small:

-- BAD: Large table on right SELECT l.* FROM large_table l JOIN large_table r ON l.id = r.id;

-- GOOD: Small table on right SELECT l.* FROM large_table l JOIN small_table r ON l.id = r.id;

-- Use ANY LEFT JOIN if duplicates OK: SELECT l.* FROM large_table l ANY LEFT JOIN small_table r ON l.id = r.id;

-- Join algorithm settings: SET join_algorithm = 'grace_hash'; SET grace_hash_join_initial_buckets = 8;

-- Increase join memory: SET max_bytes_in_join = 5000000000; -- 5GB

-- Use IN instead of JOIN when possible: SELECT * FROM large_table WHERE user_id IN (SELECT user_id FROM active_users); ```

Step 6: Optimize ORDER BY

```sql -- Enable external sort: SET max_bytes_before_external_sort = 10000000000; -- 10GB

-- Or use ORDER BY with LIMIT: SELECT * FROM large_table ORDER BY created_at LIMIT 1000; -- Much less memory

-- Avoid sorting large results: -- BAD: SELECT * FROM large_table ORDER BY created_at;

-- GOOD: Use index or limit SELECT * FROM large_table ORDER BY created_at LIMIT 10000;

-- Pre-sort in subquery: SELECT * FROM ( SELECT * FROM large_table ORDER BY user_id LIMIT 1000000 ) ORDER BY created_at LIMIT 100; ```

Step 7: Use Projection and Materialized Views

```sql -- Create projection for common queries: ALTER TABLE events ADD PROJECTION events_by_user ( SELECT * ORDER BY user_id, event_time );

ALTER TABLE events MATERIALIZE PROJECTION events_by_user;

-- Use materialized view for pre-aggregation: CREATE MATERIALIZED VIEW events_daily_mv ENGINE = SummingMergeTree() ORDER BY (event_date, user_id) AS SELECT toDate(event_time) as event_date, user_id, COUNT(*) as event_count FROM events GROUP BY event_date, user_id;

-- Query pre-aggregated data: SELECT event_date, SUM(event_count) as total FROM events_daily_mv GROUP BY event_date;

-- Much less memory than: SELECT toDate(event_time) as event_date, COUNT(*) as total FROM events GROUP BY event_date; ```

Step 8: Optimize Table Structure

```sql -- Check table engine and sorting key: SHOW CREATE TABLE large_table;

-- Use appropriate PRIMARY KEY: -- Data sorted by PRIMARY KEY -- Queries filtering on KEY are efficient

-- Partition large tables: CREATE TABLE events ( event_time DateTime, user_id UInt64, event_type String ) ENGINE = MergeTree() PARTITION BY toYYYYMM(event_time) ORDER BY (event_time, user_id);

-- Query specific partition: SELECT * FROM events WHERE event_time BETWEEN '2024-01-01' AND '2024-01-31'; -- Only scans January partition

-- Use appropriate data types: -- Smaller types = less memory -- UInt32 vs UInt64, String vs LowCardinality(String) ```

Step 9: Monitor and Debug Memory

```sql -- Query memory usage: SELECT query_id, substring(query, 1, 100) as query_short, memory_usage, peak_memory_usage, read_rows, read_bytes FROM system.query_log WHERE type = 'QueryFinish' AND event_date = today() ORDER BY memory_usage DESC LIMIT 10;

-- Memory by user: SELECT user, sum(memory_usage) as total_memory, count() as query_count FROM system.query_log WHERE type = 'QueryFinish' GROUP BY user ORDER BY total_memory DESC;

-- Check memory metrics: SELECT metric, value FROM system.asynchronous_metrics WHERE metric LIKE '%Memory%';

-- Current processes: SELECT query_id, substring(query, 1, 50), memory_usage, elapsed FROM system.processes ORDER BY memory_usage DESC;

-- Kill memory-heavy query: KILL QUERY WHERE query_id = 'xxx'; ```

Step 10: ClickHouse Memory Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-clickhouse-memory.sh #!/bin/bash

HOST=${1:-"localhost"} PORT=${2:-"9000"}

echo "=== Memory Settings ===" clickhouse-client -h $HOST --port $PORT -q " SELECT name, value FROM system.settings WHERE name LIKE '%memory%' OR name LIKE '%max_bytes%' ORDER BY name"

echo "" echo "=== Memory Metrics ===" clickhouse-client -h $HOST --port $PORT -q " SELECT metric, value FROM system.metrics WHERE metric LIKE '%Memory%'"

echo "" echo "=== Top Memory Queries ===" clickhouse-client -h $HOST --port $PORT -q " SELECT query_id, substring(query, 1, 60) as query, memory_usage, read_rows FROM system.query_log WHERE type = 'QueryFinish' AND event_date = today() ORDER BY memory_usage DESC LIMIT 5"

echo "" echo "=== Active Queries ===" clickhouse-client -h $HOST --port $PORT -q " SELECT query_id, substring(query, 1, 50) as query, memory_usage, elapsed FROM system.processes ORDER BY memory_usage DESC"

echo "" echo "=== Disk Space ===" df -h /var/lib/clickhouse

echo "" echo "=== Recommendations ===" echo "1. Increase max_memory_usage if needed" echo "2. Enable external GROUP BY with max_bytes_before_external_group_by" echo "3. Enable external sort with max_bytes_before_external_sort" echo "4. Use appropriate JOIN order (small table right)" echo "5. Add LIMIT to ORDER BY queries" echo "6. Use materialized views for pre-aggregation" echo "7. Check partition pruning is working" EOF

chmod +x /usr/local/bin/check-clickhouse-memory.sh

# Usage: /usr/local/bin/check-clickhouse-memory.sh localhost 9000 ```

ClickHouse Memory Checklist

CheckExpected
Memory limitAdequate for queries
External GROUP BYEnabled for large ops
JOIN orderSmall table right
ORDER BYHas LIMIT
PartitionsQuery hits specific partition
AggregatesUsing uniq() instead of uniqExact()
ProjectionsCreated for common queries

Verify the Fix

```bash # After fixing ClickHouse memory issues

# 1. Check settings applied SELECT name, value FROM system.settings WHERE name = 'max_memory_usage'; // Shows increased value

# 2. Run previously failing query SELECT user_id, COUNT(*) FROM large_table GROUP BY user_id; // Completes successfully

# 3. Check memory usage SELECT query_id, memory_usage FROM system.query_log WHERE type = 'QueryFinish' ORDER BY event_time DESC LIMIT 1; // memory_usage within limits

# 4. Verify external sort used # Query logs show spill to disk if needed

# 5. Monitor ongoing SELECT sum(memory_usage) FROM system.processes; // Within acceptable range

# 6. Check query performance SELECT query, elapsed FROM system.query_log WHERE type = 'QueryFinish' ORDER BY event_time DESC LIMIT 5; // Queries complete faster ```

  • [Fix PostgreSQL Connection Pool Exhausted](/articles/fix-postgresql-connection-pool-exhausted)
  • [Fix MySQL Slow Query](/articles/fix-mysql-slow-query)
  • [Fix Elasticsearch Query Timeout](/articles/fix-elasticsearch-query-timeout)