Your Logstash pipeline stopped processing events, or you're seeing errors in the logs that indicate pipeline failures. Logstash is critical for data ingestion, so getting it back online quickly is essential.
Understanding Pipeline Errors
Logstash pipeline errors typically fall into these categories:
- Configuration syntax errors
- Plugin connection failures
- Queue/buffer overflow
- Filter processing exceptions
- Output delivery failures
Common error patterns:
[ERROR][logstash.agent] Failed to execute action {:action=>LogStash::PipelineAction::Create, :exception=>LogStash::ConfigurationError, :message=>"Expected one of #, {, }"[ERROR][logstash.outputs.elasticsearch] Failed to flush outgoing items {:message=>"Elasticsearch is unreachable"[ERROR][logstash.pipeline] A plugin had an unrecoverable error {:plugin=>"#<LogStash::FilterDelegatorInvoker>"Initial Diagnosis
Start by checking Logstash status and recent logs:
```bash # Check Logstash service status systemctl status logstash
# View recent errors journalctl -u logstash -n 100 --no-pager | grep -i "error|failed|exception"
# Or check the log file tail -n 100 /var/log/logstash/logstash-plain.log | grep -i "error"
# Check pipeline stats API (if Logstash is running) curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines'
# Check overall node stats curl -s http://localhost:9600/_node/stats?pretty | jq '.process, .pipeline'
# Check currently loaded pipelines curl -s http://localhost:9600/_node/pipelines?pretty ```
Common Cause 1: Configuration Syntax Errors
The most common cause is invalid pipeline configuration syntax.
Error pattern:
``
ConfigurationError: Expected one of #, {, }
Diagnosis:
```bash # Test configuration file syntax /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/pipeline.conf
# Test all configuration files /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/
# Check for specific syntax issues cat /etc/logstash/conf.d/pipeline.conf | grep -n "input|output|filter"
# Validate with detailed output /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/pipeline.conf --config.test_and_exit --debug ```
Solution:
Fix the configuration syntax:
```ruby # Common syntax issues and fixes
# ISSUE: Missing braces input { file { path => "/var/log/app.log" } # MISSING closing brace
# FIX: input { file { path => "/var/log/app.log" } }
# ISSUE: Incorrect plugin syntax filter { grok { match => "message", "%{TIMESTAMP_ISO8601:timestamp}" # WRONG - should be array } }
# FIX: filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}" } } }
# ISSUE: Invalid conditionals filter { if [level] == "error" { mutate { add_field => { "severity" => "high" } } else # WRONG - missing closing brace and proper else syntax mutate { add_field => { "severity" => "low" } } } }
# FIX: filter { if [level] == "error" { mutate { add_field => { "severity" => "high" } } } else { mutate { add_field => { "severity" => "low" } } } } ```
After fixing, test and restart:
```bash # Test the fixed configuration /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/pipeline.conf
# Restart Logstash systemctl restart logstash
# Verify it started systemctl status logstash journalctl -u logstash -f ```
Common Cause 2: Elasticsearch Output Connection Failure
The Elasticsearch output plugin cannot reach the destination cluster.
Error pattern:
``
Elasticsearch is unreachable {:url=>"http://elasticsearch:9200"}
Diagnosis:
```bash # Test Elasticsearch connectivity from Logstash server curl -v http://elasticsearch:9200/_cluster/health
# Check if Elasticsearch is running curl -s http://elasticsearch:9200
# Test with the exact URL from configuration curl -v http://elasticsearch:9200/_bulk
# Check DNS resolution nslookup elasticsearch
# For Kubernetes environments, test from Logstash pod kubectl exec -it logstash-pod -- curl http://elasticsearch:9200/_cluster/health ```
Solution:
Fix the Elasticsearch output configuration:
```ruby output { elasticsearch { # Ensure correct hosts format hosts => ["http://elasticsearch:9200"]
# For authentication user => "elastic" password => "yourpassword"
# For TLS ssl => true cacert => "/path/to/ca.crt"
# For index pattern index => "logs-%{+YYYY.MM.dd}"
# For template management manage_template => false } } ```
For connection pooling and retries:
```ruby output { elasticsearch { hosts => ["http://elasticsearch-1:9200", "http://elasticsearch-2:9200"]
# Retry settings retry_max_interval => 30 retry_on_conflict => 5
# Connection settings pool_max => 100 pool_max_per_route => 20
# Timeout settings timeout => 60 } } ```
Common Cause 3: Queue Overflow and Backpressure
The persistent queue fills up, blocking pipeline execution.
Error pattern:
``
[WARN][logstash.pipeline] Pipeline has exceeded the in-flight events max
[ERROR][logstash.pipeline] Queue is full, dropping eventsDiagnosis:
```bash # Check queue stats curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.queue'
# Check queue capacity and usage curl -s http://localhost:9600/_node/stats?pretty | jq '.pipelines.main.queue'
# Check disk space for queue storage df -h /var/lib/logstash/queue
# Check event throughput curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events' ```
Solution:
Adjust queue settings in logstash.yml:
```yaml # /etc/logstash/logstash.yml pipeline: workers: 4 batch: size: 125 delay: 50
queue: type: persisted page_capacity: 250mb max_events: 10000 max_bytes: 1gb
config: reload: automatic: true interval: 3s ```
Increase workers and batch size:
# Or override via command line
/usr/share/logstash/bin/logstash -w 8 -b 250 -f /etc/logstash/conf.d/pipeline.confClear a stuck queue:
```bash # Stop Logstash systemctl stop logstash
# Clear the queue (WARNING: lose queued events) rm -rf /var/lib/logstash/queue/main/*
# Start Logstash fresh systemctl start logstash ```
Common Cause 4: Grok Filter Failures
Grok pattern matching failures can slow down or crash the pipeline.
Error pattern:
``
[WARN][logstash.filters.grok] Grok pattern %{CUSTOM_PATTERN} does not exist
[ERROR][logstash.filters.grok] Grok parsing failure {:field=>"message"}Diagnosis:
```bash # Check grok failures curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name=="grok")'
# Test grok pattern online or with logstash /usr/share/logstash/bin/logstash -e ' filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" } } } ' --debug
# View grok failure rate tail -f /var/log/logstash/logstash-plain.log | grep -i "grok|failure" ```
Solution:
Fix grok patterns and handle failures gracefully:
```ruby filter { # Define custom patterns first grok { patterns_dir => "/etc/logstash/patterns" match => { "message" => "%{CUSTOM_LOG_PATTERN}" } }
# Handle grok failures if "_grokparsefailure" in [tags] { # Fallback parsing grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:raw_message}" } overwrite => ["message"] remove_tag => ["_grokparsefailure"] }
# Or drop failures for high-volume logs drop { } } } ```
Create custom patterns file:
# /etc/logstash/patterns/custom
CUSTOM_LOG_PATTERN %{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:level}\s+\[%{DATA:thread}\]\s+%{JAVACLASS:class}\s+-\s+%{GREEDYDATA:message}
LOGLEVEL (DEBUG|INFO|WARN|ERROR|FATAL|TRACE)Common Cause 5: Java Heap Memory Issues
Insufficient heap memory causes performance degradation and crashes.
Error pattern:
``
Java heap space
OutOfMemoryError: Java heap spaceDiagnosis:
```bash # Check current JVM heap settings ps aux | grep logstash | grep -o "Xmx[0-9]*[gm]"
# Check heap usage via API curl -s http://localhost:9600/_node/stats/jvm?pretty | jq '.jvm.memory'
# Check for GC overhead curl -s http://localhost:9600/_node/stats/jvm?pretty | jq '.jvm.gc'
# Monitor memory during operation watch -n 5 'curl -s http://localhost:9600/_node/stats/jvm?pretty | jq ".jvm.memory.heap_used_percent"' ```
Solution:
Adjust JVM settings in jvm.options:
```bash # /etc/logstash/jvm.options -Xms4g -Xmx4g
# For large installations -Xms8g -Xmx8g
# GC settings -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75 -XX:MaxGCPauseMillis=200 ```
Restart with new settings:
systemctl restart logstashCommon Cause 6: Plugin Installation Issues
Missing or incompatible plugins cause pipeline failures.
Error pattern:
``
[ERROR][logstash.pipeline] Couldn't find any filter plugin named 'custom_filter'
Diagnosis:
```bash # List installed plugins /usr/share/logstash/bin/logstash-plugin list
# Check specific plugin /usr/share/logstash/bin/logstash-plugin list | grep grok
# Check plugin version /usr/share/logstash/bin/logstash-plugin list --verbose | grep elasticsearch ```
Solution:
Install missing plugins:
```bash # Install official plugin /usr/share/logstash/bin/logstash-plugin install logstash-filter-json
# Update existing plugin /usr/share/logstash/bin/logstash-plugin update logstash-output-elasticsearch
# Update all plugins /usr/share/logstash/bin/logstash-plugin update ```
Common Cause 7: Dead Letter Queue Issues
Failed events accumulate in dead letter queue, consuming disk space.
Diagnosis:
```bash # Check DLQ status curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.dead_letter_queue'
# Check DLQ disk usage ls -lah /var/lib/logstash/dead_letter_queue/
# Count events in DLQ find /var/lib/logstash/dead_letter_queue/ -name "*.log" -exec wc -l {} \; ```
Solution:
Configure DLQ properly:
# logstash.yml
dead_letter_queue:
enable: true
path: "/var/lib/logstash/dead_letter_queue"
max_bytes: 1024mb # Limit DLQ size
flush_interval: 5000Process DLQ events:
```ruby # Create a DLQ processing pipeline input { dead_letter_queue { path => "/var/lib/logstash/dead_letter_queue" commit_logs => "/var/lib/logstash/data/dead_letter_queue.commit" } }
filter { # Add metadata for analysis mutate { add_field => { "dlq_processed" => "true" } add_field => { "original_error" => "%{[@metadata][dead_letter_queue][reason]}" } } }
output { file { path => "/var/log/logstash/dlq-events.log" } } ```
Verification
After fixing pipeline issues, verify operation:
```bash # Check pipeline status curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
# Monitor event throughput watch -n 5 'curl -s http://localhost:9600/_node/stats/pipelines | jq ".pipelines.main.events throughput"'
# Check for recent errors journalctl -u logstash --since "5 minutes ago" | grep -i error
# Verify data is flowing to outputs curl -s http://elasticsearch:9200/_cat/indices?v | grep logs
# Test end-to-end data flow echo "Test message from $(date)" >> /var/log/app.log sleep 5 curl -s 'http://elasticsearch:9200/logs-*/_search?q=test%20message' | jq '.hits.total' ```
Prevention
Implement monitoring for Logstash:
```yaml groups: - name: logstash_health rules: - alert: LogstashPipelineStopped expr: logstash_pipeline_events_filtered_total == 0 for: 5m labels: severity: critical annotations: summary: "Logstash pipeline stopped processing events"
- alert: LogstashHighHeapUsage
- expr: logstash_jvm_memory_heap_used_percent > 80
- for: 5m
- labels:
- severity: warning
- annotations:
- summary: "Logstash heap usage at {{ $value }}%"
- alert: LogstashQueueFull
- expr: logstash_pipeline_queue_size > 0.9 * logstash_pipeline_queue_max_size
- for: 2m
- labels:
- severity: critical
- annotations:
- summary: "Logstash queue nearly full"
`
Regular maintenance:
```bash #!/bin/bash # logstash-health-check.sh
# Check pipeline stats curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipelines.main.events'
# Check for errors in last hour journalctl -u logstash --since "1 hour ago" | grep -c "ERROR"
# Check queue health curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipipelines.main.queue'
# Check DLQ size du -sh /var/lib/logstash/dead_letter_queue/ ```
Pipeline errors usually stem from configuration issues or downstream problems. Test configuration syntax first, then check output connectivity and queue health for a systematic approach to resolution.