What's Actually Happening

FluentBit runs out of memory when buffering logs during output destination unavailability or high log volume. The service crashes or drops logs.

The Error You'll See

Memory exceeded:

```bash $ journalctl -u fluent-bit | grep -i memory

[error] [storage] memory exceeded, cannot allocate buffer [error] [input] cannot append data to storage, memory limit reached ```

Buffer full:

bash
[warning] [storage] buffer is full, data will be dropped
[error] [output] cannot write to output, retry limit reached

Process killed:

```bash $ dmesg | grep fluent-bit

Out of memory: Killed process 12345 (fluent-bit) total-vm:2048000kB ```

Why This Happens

  1. 1.Output unavailable - Destination down, logs accumulate
  2. 2.Buffer too small - Insufficient storage for burst
  3. 3.Memory limit low - Process memory limit too restrictive
  4. 4.No filesystem storage - Using only in-memory buffers
  5. 5.High log volume - Sudden spike in log throughput
  6. 6.Slow output - Destination cannot keep up

Step 1: Check FluentBit Status

```bash # Check FluentBit running: systemctl status fluent-bit

# Check memory usage: ps aux | grep fluent-bit

# Check buffer status: curl http://localhost:2020/api/v1/storage

# Output: { "chunks": { "total_chunks": 100, "mem_chunks": 50, "fs_chunks": 50 } }

# Check metrics: curl http://localhost:2020/api/v1/metrics/prometheus

# Key metrics: # fluentbit_input_bytes_total # fluentbit_output_proc_successes_total # fluentbit_output_proc_errors_total # fluentbit_storage_memory_bytes ```

Step 2: Check Configuration

```bash # Check current config: cat /etc/fluent-bit/fluent-bit.conf

# Key buffer settings: [SERVICE] Flush 5 Daemon Off Log_Level info Parsers_File parsers.conf Plugins_File plugins.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020

# Storage settings storage.metrics on storage.path /var/log/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 5MB

# Check input configuration: [INPUT] Name tail Path /var/log/*.log storage.type filesystem Mem_Buf_Limit 50MB Skip_Long_Lines On ```

Step 3: Configure Filesystem Storage

```bash # Enable filesystem storage to offload memory:

# In fluent-bit.conf: [SERVICE] # Enable filesystem storage storage.path /var/log/flb-storage storage.sync normal storage.checksum off storage.backlog.mem_limit 50MB storage.max_chunks_up 128 storage.backlog.mem_limit 50MB

# In each INPUT: [INPUT] Name tail Path /var/log/*.log storage.type filesystem # Use filesystem, not memory Mem_Buf_Limit 50MB

# Create storage directory: mkdir -p /var/log/flb-storage chown fluent-bit:fluent-bit /var/log/flb-storage

# Restart FluentBit: systemctl restart fluent-bit

# Verify storage is used: ls -la /var/log/flb-storage/ ```

Step 4: Increase Memory Limits

```bash # Increase process memory limit:

# In fluent-bit.conf: [SERVICE] # Increase backlog memory limit storage.backlog.mem_limit 100MB

# Or in systemd service: cat /etc/systemd/system/fluent-bit.service

[Service] MemoryLimit=2G MemoryHigh=1.5G

# Increase limits: systemctl edit fluent-bit --full

[Service] MemoryLimit=4G MemoryHigh=3G

# Restart: systemctl daemon-reload systemctl restart fluent-bit

# For Kubernetes: resources: limits: memory: 4Gi requests: memory: 2Gi ```

Step 5: Configure Mem_Buf_Limit

```bash # Limit per-input memory:

[INPUT] Name tail Path /var/log/containers/*.log Mem_Buf_Limit 100MB # Limit buffer per input storage.type filesystem Skip_Long_Lines On Refresh_Interval 10 Rotate_Wait 30 Buffer_Chunk_Size 512k Buffer_Max_Size 5MB

# When Mem_Buf_Limit reached: # - Logs dropped if storage.type is memory # - Logs saved to filesystem if storage.type is filesystem

# Adjust based on log volume: # - High volume: 100-500MB # - Normal volume: 50-100MB # - Low volume: 10-50MB ```

Step 6: Configure Output Retry

```bash # Output configuration with retry:

[OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200 Index fluent-bit Type _doc

# Retry settings Retry_Limit False # Retry forever Retry_Limit 10 # Or limit retries

# Buffer settings storage.total_limit_size 1G # Max buffer size

# Timeouts net.connect_timeout 10 net.keepalive On net.keepalive_idle 10 net.keepalive_intvl 1 net.keepalive_cnt 3

# When output fails: # - Logs buffered according to storage.type # - Retries according to Retry_Limit # - If retry limit reached and storage full, logs dropped ```

Step 7: Handle Output Failures

```bash # Check output connectivity: curl -I http://elasticsearch:9200

# If output down, logs buffer locally # Monitor buffer growth:

watch -n 5 'curl -s http://localhost:2020/api/v1/storage'

# When output recovers, buffer drains

# Configure fallback output: [OUTPUT] Name file Match * Path /var/log/fluent-bit-fallback.log storage.total_limit_size 500M

# Use routing for fallback: [OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200

[OUTPUT] Name file Match * Path /var/log/flb-fallback.log storage.total_limit_size 500M ```

Step 8: Monitor Buffer Health

```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-fluentbit.sh #!/bin/bash

echo "=== FluentBit Storage ===" curl -s http://localhost:2020/api/v1/storage | jq

echo "" echo "=== Memory Usage ===" ps aux | grep fluent-bit | grep -v grep | awk '{print $6/1024 " MB"}'

echo "" echo "=== Disk Storage ===" du -sh /var/log/flb-storage/

echo "" echo "=== Output Status ===" curl -s http://localhost:2020/api/v1/metrics/prometheus | grep -E "fluentbit_output_(proc|errors|retries)"

echo "" echo "=== Buffer Chunks ===" ls -la /var/log/flb-storage/ 2>/dev/null | wc -l EOF

chmod +x /usr/local/bin/monitor-fluentbit.sh

# Prometheus alerts: - alert: FluentBitMemoryHigh expr: fluentbit_process_resident_memory_bytes > 1073741824 for: 5m labels: severity: warning annotations: summary: "FluentBit memory usage > 1GB"

  • alert: FluentBitBufferFull
  • expr: rate(fluentbit_output_proc_errors_total[5m]) > 0
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "FluentBit output errors, buffer may be full"
  • `

Step 9: Optimize Log Parsing

```bash # Reduce memory with efficient parsing:

# Use multiline parsing carefully: [MULTILINE_PARSER] Name java_multiline Type regex Flush_Timeout 1000 Rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "cont"

# In input: [INPUT] Name tail Path /var/log/java.log multiline.parser java_multiline Mem_Buf_Limit 50MB # Multiline can use more memory

# Filter to reduce data: [FILTER] Name grep Match * Exclude log ERROR

# Throttle high-volume inputs: [FILTER] Name throttle Match * Rate 1000 Window 5 Interval 1s ```

Step 10: Scale FluentBit

```bash # If single instance cannot handle load:

# Option 1: Increase resources # More memory, CPU for single instance

# Option 2: Use multiple instances # Shard logs by source: # Instance 1: /var/log/containers/app1-*.log # Instance 2: /var/log/containers/app2-*.log

# Option 3: Use DaemonSet in Kubernetes # Each node runs FluentBit for local logs

# Option 4: Forward to aggregator [OUTPUT] Name forward Match * Host fluentd-aggregator Port 24224

# Aggregator handles buffering and output # FluentBit just forwards, minimal buffer needed

# Restart after changes: systemctl restart fluent-bit ```

FluentBit Buffer Memory Checklist

CheckCommandExpected
Memory usageps aux< limit
Storage enabledconfigfilesystem
Buffer limitMem_Buf_LimitAdequate
Output healthcurl endpointConnected
Disk storagedu flb-storageNot full
Retry configconfigAppropriate

Verify the Fix

```bash # After configuring buffers

# 1. Check memory usage ps aux | grep fluent-bit // Within limit

# 2. Test storage working ls /var/log/flb-storage/ // Chunks present

# 3. Simulate output failure # Stop Elasticsearch systemctl stop elasticsearch

# 4. Send logs logger "test log message"

# 5. Check buffer grows curl http://localhost:2020/api/v1/storage // fs_chunks increasing

# 6. Restore output and verify drain systemctl start elasticsearch # Buffer drains ```

  • [Fix Fluentd Buffer Overflow](/articles/fix-fluentd-buffer-overflow)
  • [Fix Loki Ingestion Rate Limit](/articles/fix-loki-ingestion-rate-limit)
  • [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)