Fix FluentBit Buffer Memory Exceeded

What's Actually Happening

FluentBit runs out of memory when buffering logs during output destination unavailability or high log volume. The service crashes or drops logs.

The Error You'll See

Memory exceeded:

```bash $ journalctl -u fluent-bit | grep -i memory

[error] [storage] memory exceeded, cannot allocate buffer [error] [input] cannot append data to storage, memory limit reached ```

Buffer full:

bash

[warning] [storage] buffer is full, data will be dropped
[error] [output] cannot write to output, retry limit reached

Process killed:

```bash $ dmesg | grep fluent-bit

Out of memory: Killed process 12345 (fluent-bit) total-vm:2048000kB ```

Why This Happens

1.Output unavailable - Destination down, logs accumulate
2.Buffer too small - Insufficient storage for burst
3.Memory limit low - Process memory limit too restrictive
4.No filesystem storage - Using only in-memory buffers
5.High log volume - Sudden spike in log throughput
6.Slow output - Destination cannot keep up

Step 1: Check FluentBit Status

```bash # Check FluentBit running: systemctl status fluent-bit

# Check memory usage: ps aux | grep fluent-bit

# Check buffer status: curl http://localhost:2020/api/v1/storage

# Output: { "chunks": { "total_chunks": 100, "mem_chunks": 50, "fs_chunks": 50 } }

# Check metrics: curl http://localhost:2020/api/v1/metrics/prometheus

# Key metrics: # fluentbit_input_bytes_total # fluentbit_output_proc_successes_total # fluentbit_output_proc_errors_total # fluentbit_storage_memory_bytes ```

Step 2: Check Configuration

```bash # Check current config: cat /etc/fluent-bit/fluent-bit.conf

# Key buffer settings: [SERVICE] Flush 5 Daemon Off Log_Level info Parsers_File parsers.conf Plugins_File plugins.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020

# Storage settings storage.metrics on storage.path /var/log/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 5MB

# Check input configuration: [INPUT] Name tail Path /var/log/*.log storage.type filesystem Mem_Buf_Limit 50MB Skip_Long_Lines On ```

Step 3: Configure Filesystem Storage

```bash # Enable filesystem storage to offload memory:

# In fluent-bit.conf: [SERVICE] # Enable filesystem storage storage.path /var/log/flb-storage storage.sync normal storage.checksum off storage.backlog.mem_limit 50MB storage.max_chunks_up 128 storage.backlog.mem_limit 50MB

# In each INPUT: [INPUT] Name tail Path /var/log/*.log storage.type filesystem # Use filesystem, not memory Mem_Buf_Limit 50MB

# Create storage directory: mkdir -p /var/log/flb-storage chown fluent-bit:fluent-bit /var/log/flb-storage

# Restart FluentBit: systemctl restart fluent-bit

# Verify storage is used: ls -la /var/log/flb-storage/ ```

Step 4: Increase Memory Limits

```bash # Increase process memory limit:

# In fluent-bit.conf: [SERVICE] # Increase backlog memory limit storage.backlog.mem_limit 100MB

# Or in systemd service: cat /etc/systemd/system/fluent-bit.service

[Service] MemoryLimit=2G MemoryHigh=1.5G

# Increase limits: systemctl edit fluent-bit --full

[Service] MemoryLimit=4G MemoryHigh=3G

# Restart: systemctl daemon-reload systemctl restart fluent-bit

# For Kubernetes: resources: limits: memory: 4Gi requests: memory: 2Gi ```

Step 5: Configure Mem_Buf_Limit

```bash # Limit per-input memory:

[INPUT] Name tail Path /var/log/containers/*.log Mem_Buf_Limit 100MB # Limit buffer per input storage.type filesystem Skip_Long_Lines On Refresh_Interval 10 Rotate_Wait 30 Buffer_Chunk_Size 512k Buffer_Max_Size 5MB

# When Mem_Buf_Limit reached: # - Logs dropped if storage.type is memory # - Logs saved to filesystem if storage.type is filesystem

# Adjust based on log volume: # - High volume: 100-500MB # - Normal volume: 50-100MB # - Low volume: 10-50MB ```

Step 6: Configure Output Retry

```bash # Output configuration with retry:

[OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200 Index fluent-bit Type _doc

# Retry settings Retry_Limit False # Retry forever Retry_Limit 10 # Or limit retries

# Buffer settings storage.total_limit_size 1G # Max buffer size

# Timeouts net.connect_timeout 10 net.keepalive On net.keepalive_idle 10 net.keepalive_intvl 1 net.keepalive_cnt 3

# When output fails: # - Logs buffered according to storage.type # - Retries according to Retry_Limit # - If retry limit reached and storage full, logs dropped ```

Step 7: Handle Output Failures

```bash # Check output connectivity: curl -I http://elasticsearch:9200

# If output down, logs buffer locally # Monitor buffer growth:

watch -n 5 'curl -s http://localhost:2020/api/v1/storage'

# When output recovers, buffer drains

# Configure fallback output: [OUTPUT] Name file Match * Path /var/log/fluent-bit-fallback.log storage.total_limit_size 500M

# Use routing for fallback: [OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200

[OUTPUT] Name file Match * Path /var/log/flb-fallback.log storage.total_limit_size 500M ```

Step 8: Monitor Buffer Health

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-fluentbit.sh #!/bin/bash

echo "=== FluentBit Storage ===" curl -s http://localhost:2020/api/v1/storage | jq

echo "" echo "=== Memory Usage ===" ps aux | grep fluent-bit | grep -v grep | awk '{print $6/1024 " MB"}'

echo "" echo "=== Disk Storage ===" du -sh /var/log/flb-storage/

echo "" echo "=== Output Status ===" curl -s http://localhost:2020/api/v1/metrics/prometheus | grep -E "fluentbit_output_(proc|errors|retries)"

echo "" echo "=== Buffer Chunks ===" ls -la /var/log/flb-storage/ 2>/dev/null | wc -l EOF

chmod +x /usr/local/bin/monitor-fluentbit.sh

# Prometheus alerts: - alert: FluentBitMemoryHigh expr: fluentbit_process_resident_memory_bytes > 1073741824 for: 5m labels: severity: warning annotations: summary: "FluentBit memory usage > 1GB"

alert: FluentBitBufferFull
expr: rate(fluentbit_output_proc_errors_total[5m]) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "FluentBit output errors, buffer may be full"
`

Step 9: Optimize Log Parsing

```bash # Reduce memory with efficient parsing:

# Use multiline parsing carefully: [MULTILINE_PARSER] Name java_multiline Type regex Flush_Timeout 1000 Rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "cont"

# In input: [INPUT] Name tail Path /var/log/java.log multiline.parser java_multiline Mem_Buf_Limit 50MB # Multiline can use more memory

# Filter to reduce data: [FILTER] Name grep Match * Exclude log ERROR

# Throttle high-volume inputs: [FILTER] Name throttle Match * Rate 1000 Window 5 Interval 1s ```

Step 10: Scale FluentBit

```bash # If single instance cannot handle load:

# Option 1: Increase resources # More memory, CPU for single instance

# Option 2: Use multiple instances # Shard logs by source: # Instance 1: /var/log/containers/app1-*.log # Instance 2: /var/log/containers/app2-*.log

# Option 3: Use DaemonSet in Kubernetes # Each node runs FluentBit for local logs

# Option 4: Forward to aggregator [OUTPUT] Name forward Match * Host fluentd-aggregator Port 24224

# Aggregator handles buffering and output # FluentBit just forwards, minimal buffer needed

# Restart after changes: systemctl restart fluent-bit ```

FluentBit Buffer Memory Checklist

Check	Command	Expected
Memory usage	ps aux	< limit
Storage enabled	config	filesystem
Buffer limit	Mem_Buf_Limit	Adequate
Output health	curl endpoint	Connected
Disk storage	du flb-storage	Not full
Retry config	config	Appropriate

Verify the Fix

```bash # After configuring buffers

# 1. Check memory usage ps aux | grep fluent-bit // Within limit

# 2. Test storage working ls /var/log/flb-storage/ // Chunks present

# 3. Simulate output failure # Stop Elasticsearch systemctl stop elasticsearch

# 4. Send logs logger "test log message"

# 5. Check buffer grows curl http://localhost:2020/api/v1/storage // fs_chunks increasing

# 6. Restore output and verify drain systemctl start elasticsearch # Buffer drains ```

[Fix Fluentd Buffer Overflow](/articles/fix-fluentd-buffer-overflow)
[Fix Loki Ingestion Rate Limit](/articles/fix-loki-ingestion-rate-limit)
[Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Check FluentBit Status

Step 2: Check Configuration

Step 3: Configure Filesystem Storage

Step 4: Increase Memory Limits

Step 5: Configure Mem_Buf_Limit

Step 6: Configure Output Retry

Step 7: Handle Output Failures

Step 8: Monitor Buffer Health

Step 9: Optimize Log Parsing

Step 10: Scale FluentBit

FluentBit Buffer Memory Checklist

Verify the Fix

Related Issues

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Fix Fluentd Log Not Sending