What's Actually Happening
FluentBit runs out of memory when buffering logs during output destination unavailability or high log volume. The service crashes or drops logs.
The Error You'll See
Memory exceeded:
```bash $ journalctl -u fluent-bit | grep -i memory
[error] [storage] memory exceeded, cannot allocate buffer [error] [input] cannot append data to storage, memory limit reached ```
Buffer full:
[warning] [storage] buffer is full, data will be dropped
[error] [output] cannot write to output, retry limit reachedProcess killed:
```bash $ dmesg | grep fluent-bit
Out of memory: Killed process 12345 (fluent-bit) total-vm:2048000kB ```
Why This Happens
- 1.Output unavailable - Destination down, logs accumulate
- 2.Buffer too small - Insufficient storage for burst
- 3.Memory limit low - Process memory limit too restrictive
- 4.No filesystem storage - Using only in-memory buffers
- 5.High log volume - Sudden spike in log throughput
- 6.Slow output - Destination cannot keep up
Step 1: Check FluentBit Status
```bash # Check FluentBit running: systemctl status fluent-bit
# Check memory usage: ps aux | grep fluent-bit
# Check buffer status: curl http://localhost:2020/api/v1/storage
# Output: { "chunks": { "total_chunks": 100, "mem_chunks": 50, "fs_chunks": 50 } }
# Check metrics: curl http://localhost:2020/api/v1/metrics/prometheus
# Key metrics: # fluentbit_input_bytes_total # fluentbit_output_proc_successes_total # fluentbit_output_proc_errors_total # fluentbit_storage_memory_bytes ```
Step 2: Check Configuration
```bash # Check current config: cat /etc/fluent-bit/fluent-bit.conf
# Key buffer settings: [SERVICE] Flush 5 Daemon Off Log_Level info Parsers_File parsers.conf Plugins_File plugins.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020
# Storage settings storage.metrics on storage.path /var/log/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 5MB
# Check input configuration: [INPUT] Name tail Path /var/log/*.log storage.type filesystem Mem_Buf_Limit 50MB Skip_Long_Lines On ```
Step 3: Configure Filesystem Storage
```bash # Enable filesystem storage to offload memory:
# In fluent-bit.conf: [SERVICE] # Enable filesystem storage storage.path /var/log/flb-storage storage.sync normal storage.checksum off storage.backlog.mem_limit 50MB storage.max_chunks_up 128 storage.backlog.mem_limit 50MB
# In each INPUT: [INPUT] Name tail Path /var/log/*.log storage.type filesystem # Use filesystem, not memory Mem_Buf_Limit 50MB
# Create storage directory: mkdir -p /var/log/flb-storage chown fluent-bit:fluent-bit /var/log/flb-storage
# Restart FluentBit: systemctl restart fluent-bit
# Verify storage is used: ls -la /var/log/flb-storage/ ```
Step 4: Increase Memory Limits
```bash # Increase process memory limit:
# In fluent-bit.conf: [SERVICE] # Increase backlog memory limit storage.backlog.mem_limit 100MB
# Or in systemd service: cat /etc/systemd/system/fluent-bit.service
[Service] MemoryLimit=2G MemoryHigh=1.5G
# Increase limits: systemctl edit fluent-bit --full
[Service] MemoryLimit=4G MemoryHigh=3G
# Restart: systemctl daemon-reload systemctl restart fluent-bit
# For Kubernetes: resources: limits: memory: 4Gi requests: memory: 2Gi ```
Step 5: Configure Mem_Buf_Limit
```bash # Limit per-input memory:
[INPUT] Name tail Path /var/log/containers/*.log Mem_Buf_Limit 100MB # Limit buffer per input storage.type filesystem Skip_Long_Lines On Refresh_Interval 10 Rotate_Wait 30 Buffer_Chunk_Size 512k Buffer_Max_Size 5MB
# When Mem_Buf_Limit reached: # - Logs dropped if storage.type is memory # - Logs saved to filesystem if storage.type is filesystem
# Adjust based on log volume: # - High volume: 100-500MB # - Normal volume: 50-100MB # - Low volume: 10-50MB ```
Step 6: Configure Output Retry
```bash # Output configuration with retry:
[OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200 Index fluent-bit Type _doc
# Retry settings Retry_Limit False # Retry forever Retry_Limit 10 # Or limit retries
# Buffer settings storage.total_limit_size 1G # Max buffer size
# Timeouts net.connect_timeout 10 net.keepalive On net.keepalive_idle 10 net.keepalive_intvl 1 net.keepalive_cnt 3
# When output fails: # - Logs buffered according to storage.type # - Retries according to Retry_Limit # - If retry limit reached and storage full, logs dropped ```
Step 7: Handle Output Failures
```bash # Check output connectivity: curl -I http://elasticsearch:9200
# If output down, logs buffer locally # Monitor buffer growth:
watch -n 5 'curl -s http://localhost:2020/api/v1/storage'
# When output recovers, buffer drains
# Configure fallback output: [OUTPUT] Name file Match * Path /var/log/fluent-bit-fallback.log storage.total_limit_size 500M
# Use routing for fallback: [OUTPUT] Name elasticsearch Match * Host elasticsearch Port 9200
[OUTPUT] Name file Match * Path /var/log/flb-fallback.log storage.total_limit_size 500M ```
Step 8: Monitor Buffer Health
```bash # Create monitoring script: cat << 'EOF' > /usr/local/bin/monitor-fluentbit.sh #!/bin/bash
echo "=== FluentBit Storage ===" curl -s http://localhost:2020/api/v1/storage | jq
echo "" echo "=== Memory Usage ===" ps aux | grep fluent-bit | grep -v grep | awk '{print $6/1024 " MB"}'
echo "" echo "=== Disk Storage ===" du -sh /var/log/flb-storage/
echo "" echo "=== Output Status ===" curl -s http://localhost:2020/api/v1/metrics/prometheus | grep -E "fluentbit_output_(proc|errors|retries)"
echo "" echo "=== Buffer Chunks ===" ls -la /var/log/flb-storage/ 2>/dev/null | wc -l EOF
chmod +x /usr/local/bin/monitor-fluentbit.sh
# Prometheus alerts: - alert: FluentBitMemoryHigh expr: fluentbit_process_resident_memory_bytes > 1073741824 for: 5m labels: severity: warning annotations: summary: "FluentBit memory usage > 1GB"
- alert: FluentBitBufferFull
- expr: rate(fluentbit_output_proc_errors_total[5m]) > 0
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "FluentBit output errors, buffer may be full"
`
Step 9: Optimize Log Parsing
```bash # Reduce memory with efficient parsing:
# Use multiline parsing carefully: [MULTILINE_PARSER] Name java_multiline Type regex Flush_Timeout 1000 Rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "cont"
# In input: [INPUT] Name tail Path /var/log/java.log multiline.parser java_multiline Mem_Buf_Limit 50MB # Multiline can use more memory
# Filter to reduce data: [FILTER] Name grep Match * Exclude log ERROR
# Throttle high-volume inputs: [FILTER] Name throttle Match * Rate 1000 Window 5 Interval 1s ```
Step 10: Scale FluentBit
```bash # If single instance cannot handle load:
# Option 1: Increase resources # More memory, CPU for single instance
# Option 2: Use multiple instances # Shard logs by source: # Instance 1: /var/log/containers/app1-*.log # Instance 2: /var/log/containers/app2-*.log
# Option 3: Use DaemonSet in Kubernetes # Each node runs FluentBit for local logs
# Option 4: Forward to aggregator [OUTPUT] Name forward Match * Host fluentd-aggregator Port 24224
# Aggregator handles buffering and output # FluentBit just forwards, minimal buffer needed
# Restart after changes: systemctl restart fluent-bit ```
FluentBit Buffer Memory Checklist
| Check | Command | Expected |
|---|---|---|
| Memory usage | ps aux | < limit |
| Storage enabled | config | filesystem |
| Buffer limit | Mem_Buf_Limit | Adequate |
| Output health | curl endpoint | Connected |
| Disk storage | du flb-storage | Not full |
| Retry config | config | Appropriate |
Verify the Fix
```bash # After configuring buffers
# 1. Check memory usage ps aux | grep fluent-bit // Within limit
# 2. Test storage working ls /var/log/flb-storage/ // Chunks present
# 3. Simulate output failure # Stop Elasticsearch systemctl stop elasticsearch
# 4. Send logs logger "test log message"
# 5. Check buffer grows curl http://localhost:2020/api/v1/storage // fs_chunks increasing
# 6. Restore output and verify drain systemctl start elasticsearch # Buffer drains ```
Related Issues
- [Fix Fluentd Buffer Overflow](/articles/fix-fluentd-buffer-overflow)
- [Fix Loki Ingestion Rate Limit](/articles/fix-loki-ingestion-rate-limit)
- [Fix Prometheus Remote Write Queue Full](/articles/fix-prometheus-remote-write-queue-full)