Your Logstash pipeline stopped processing events, or you're seeing errors in the logs that indicate pipeline failures. Logstash is critical for data ingestion, so getting it back online quickly is essential.

Understanding Pipeline Errors

Logstash pipeline errors typically fall into these categories:

  • Configuration syntax errors
  • Plugin connection failures
  • Queue/buffer overflow
  • Filter processing exceptions
  • Output delivery failures

Common error patterns:

bash
[ERROR][logstash.agent] Failed to execute action {:action=>LogStash::PipelineAction::Create, :exception=>LogStash::ConfigurationError, :message=>"Expected one of #, {, }"
bash
[ERROR][logstash.outputs.elasticsearch] Failed to flush outgoing items {:message=>"Elasticsearch is unreachable"
bash
[ERROR][logstash.pipeline] A plugin had an unrecoverable error {:plugin=>"#<LogStash::FilterDelegatorInvoker>"

Initial Diagnosis

Start by checking Logstash status and recent logs:

```bash # Check Logstash service status systemctl status logstash

# View recent errors journalctl -u logstash -n 100 --no-pager | grep -i "error|failed|exception"

# Or check the log file tail -n 100 /var/log/logstash/logstash-plain.log | grep -i "error"

# Check pipeline stats API (if Logstash is running) curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines'

# Check overall node stats curl -s http://localhost:9600/_node/stats?pretty | jq '.process, .pipeline'

# Check currently loaded pipelines curl -s http://localhost:9600/_node/pipelines?pretty ```

Common Cause 1: Configuration Syntax Errors

The most common cause is invalid pipeline configuration syntax.

Error pattern: `` ConfigurationError: Expected one of #, {, }

Diagnosis:

```bash # Test configuration file syntax /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/pipeline.conf

# Test all configuration files /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/

# Check for specific syntax issues cat /etc/logstash/conf.d/pipeline.conf | grep -n "input|output|filter"

# Validate with detailed output /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/pipeline.conf --config.test_and_exit --debug ```

Solution:

Fix the configuration syntax:

```ruby # Common syntax issues and fixes

# ISSUE: Missing braces input { file { path => "/var/log/app.log" } # MISSING closing brace

# FIX: input { file { path => "/var/log/app.log" } }

# ISSUE: Incorrect plugin syntax filter { grok { match => "message", "%{TIMESTAMP_ISO8601:timestamp}" # WRONG - should be array } }

# FIX: filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}" } } }

# ISSUE: Invalid conditionals filter { if [level] == "error" { mutate { add_field => { "severity" => "high" } } else # WRONG - missing closing brace and proper else syntax mutate { add_field => { "severity" => "low" } } } }

# FIX: filter { if [level] == "error" { mutate { add_field => { "severity" => "high" } } } else { mutate { add_field => { "severity" => "low" } } } } ```

After fixing, test and restart:

```bash # Test the fixed configuration /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/pipeline.conf

# Restart Logstash systemctl restart logstash

# Verify it started systemctl status logstash journalctl -u logstash -f ```

Common Cause 2: Elasticsearch Output Connection Failure

The Elasticsearch output plugin cannot reach the destination cluster.

Error pattern: `` Elasticsearch is unreachable {:url=>"http://elasticsearch:9200"}

Diagnosis:

```bash # Test Elasticsearch connectivity from Logstash server curl -v http://elasticsearch:9200/_cluster/health

# Check if Elasticsearch is running curl -s http://elasticsearch:9200

# Test with the exact URL from configuration curl -v http://elasticsearch:9200/_bulk

# Check DNS resolution nslookup elasticsearch

# For Kubernetes environments, test from Logstash pod kubectl exec -it logstash-pod -- curl http://elasticsearch:9200/_cluster/health ```

Solution:

Fix the Elasticsearch output configuration:

```ruby output { elasticsearch { # Ensure correct hosts format hosts => ["http://elasticsearch:9200"]

# For authentication user => "elastic" password => "yourpassword"

# For TLS ssl => true cacert => "/path/to/ca.crt"

# For index pattern index => "logs-%{+YYYY.MM.dd}"

# For template management manage_template => false } } ```

For connection pooling and retries:

```ruby output { elasticsearch { hosts => ["http://elasticsearch-1:9200", "http://elasticsearch-2:9200"]

# Retry settings retry_max_interval => 30 retry_on_conflict => 5

# Connection settings pool_max => 100 pool_max_per_route => 20

# Timeout settings timeout => 60 } } ```

Common Cause 3: Queue Overflow and Backpressure

The persistent queue fills up, blocking pipeline execution.

Error pattern: `` [WARN][logstash.pipeline] Pipeline has exceeded the in-flight events max

bash
[ERROR][logstash.pipeline] Queue is full, dropping events

Diagnosis:

```bash # Check queue stats curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.queue'

# Check queue capacity and usage curl -s http://localhost:9600/_node/stats?pretty | jq '.pipelines.main.queue'

# Check disk space for queue storage df -h /var/lib/logstash/queue

# Check event throughput curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events' ```

Solution:

Adjust queue settings in logstash.yml:

```yaml # /etc/logstash/logstash.yml pipeline: workers: 4 batch: size: 125 delay: 50

queue: type: persisted page_capacity: 250mb max_events: 10000 max_bytes: 1gb

config: reload: automatic: true interval: 3s ```

Increase workers and batch size:

bash
# Or override via command line
/usr/share/logstash/bin/logstash -w 8 -b 250 -f /etc/logstash/conf.d/pipeline.conf

Clear a stuck queue:

```bash # Stop Logstash systemctl stop logstash

# Clear the queue (WARNING: lose queued events) rm -rf /var/lib/logstash/queue/main/*

# Start Logstash fresh systemctl start logstash ```

Common Cause 4: Grok Filter Failures

Grok pattern matching failures can slow down or crash the pipeline.

Error pattern: `` [WARN][logstash.filters.grok] Grok pattern %{CUSTOM_PATTERN} does not exist

bash
[ERROR][logstash.filters.grok] Grok parsing failure {:field=>"message"}

Diagnosis:

```bash # Check grok failures curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name=="grok")'

# Test grok pattern online or with logstash /usr/share/logstash/bin/logstash -e ' filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" } } } ' --debug

# View grok failure rate tail -f /var/log/logstash/logstash-plain.log | grep -i "grok|failure" ```

Solution:

Fix grok patterns and handle failures gracefully:

```ruby filter { # Define custom patterns first grok { patterns_dir => "/etc/logstash/patterns" match => { "message" => "%{CUSTOM_LOG_PATTERN}" } }

# Handle grok failures if "_grokparsefailure" in [tags] { # Fallback parsing grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:raw_message}" } overwrite => ["message"] remove_tag => ["_grokparsefailure"] }

# Or drop failures for high-volume logs drop { } } } ```

Create custom patterns file:

bash
# /etc/logstash/patterns/custom
CUSTOM_LOG_PATTERN %{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:level}\s+\[%{DATA:thread}\]\s+%{JAVACLASS:class}\s+-\s+%{GREEDYDATA:message}
LOGLEVEL (DEBUG|INFO|WARN|ERROR|FATAL|TRACE)

Common Cause 5: Java Heap Memory Issues

Insufficient heap memory causes performance degradation and crashes.

Error pattern: `` Java heap space

bash
OutOfMemoryError: Java heap space

Diagnosis:

```bash # Check current JVM heap settings ps aux | grep logstash | grep -o "Xmx[0-9]*[gm]"

# Check heap usage via API curl -s http://localhost:9600/_node/stats/jvm?pretty | jq '.jvm.memory'

# Check for GC overhead curl -s http://localhost:9600/_node/stats/jvm?pretty | jq '.jvm.gc'

# Monitor memory during operation watch -n 5 'curl -s http://localhost:9600/_node/stats/jvm?pretty | jq ".jvm.memory.heap_used_percent"' ```

Solution:

Adjust JVM settings in jvm.options:

```bash # /etc/logstash/jvm.options -Xms4g -Xmx4g

# For large installations -Xms8g -Xmx8g

# GC settings -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75 -XX:MaxGCPauseMillis=200 ```

Restart with new settings:

bash
systemctl restart logstash

Common Cause 6: Plugin Installation Issues

Missing or incompatible plugins cause pipeline failures.

Error pattern: `` [ERROR][logstash.pipeline] Couldn't find any filter plugin named 'custom_filter'

Diagnosis:

```bash # List installed plugins /usr/share/logstash/bin/logstash-plugin list

# Check specific plugin /usr/share/logstash/bin/logstash-plugin list | grep grok

# Check plugin version /usr/share/logstash/bin/logstash-plugin list --verbose | grep elasticsearch ```

Solution:

Install missing plugins:

```bash # Install official plugin /usr/share/logstash/bin/logstash-plugin install logstash-filter-json

# Update existing plugin /usr/share/logstash/bin/logstash-plugin update logstash-output-elasticsearch

# Update all plugins /usr/share/logstash/bin/logstash-plugin update ```

Common Cause 7: Dead Letter Queue Issues

Failed events accumulate in dead letter queue, consuming disk space.

Diagnosis:

```bash # Check DLQ status curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.dead_letter_queue'

# Check DLQ disk usage ls -lah /var/lib/logstash/dead_letter_queue/

# Count events in DLQ find /var/lib/logstash/dead_letter_queue/ -name "*.log" -exec wc -l {} \; ```

Solution:

Configure DLQ properly:

yaml
# logstash.yml
dead_letter_queue:
  enable: true
  path: "/var/lib/logstash/dead_letter_queue"
  max_bytes: 1024mb  # Limit DLQ size
  flush_interval: 5000

Process DLQ events:

```ruby # Create a DLQ processing pipeline input { dead_letter_queue { path => "/var/lib/logstash/dead_letter_queue" commit_logs => "/var/lib/logstash/data/dead_letter_queue.commit" } }

filter { # Add metadata for analysis mutate { add_field => { "dlq_processed" => "true" } add_field => { "original_error" => "%{[@metadata][dead_letter_queue][reason]}" } } }

output { file { path => "/var/log/logstash/dlq-events.log" } } ```

Verification

After fixing pipeline issues, verify operation:

```bash # Check pipeline status curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'

# Monitor event throughput watch -n 5 'curl -s http://localhost:9600/_node/stats/pipelines | jq ".pipelines.main.events throughput"'

# Check for recent errors journalctl -u logstash --since "5 minutes ago" | grep -i error

# Verify data is flowing to outputs curl -s http://elasticsearch:9200/_cat/indices?v | grep logs

# Test end-to-end data flow echo "Test message from $(date)" >> /var/log/app.log sleep 5 curl -s 'http://elasticsearch:9200/logs-*/_search?q=test%20message' | jq '.hits.total' ```

Prevention

Implement monitoring for Logstash:

```yaml groups: - name: logstash_health rules: - alert: LogstashPipelineStopped expr: logstash_pipeline_events_filtered_total == 0 for: 5m labels: severity: critical annotations: summary: "Logstash pipeline stopped processing events"

  • alert: LogstashHighHeapUsage
  • expr: logstash_jvm_memory_heap_used_percent > 80
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Logstash heap usage at {{ $value }}%"
  • alert: LogstashQueueFull
  • expr: logstash_pipeline_queue_size > 0.9 * logstash_pipeline_queue_max_size
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Logstash queue nearly full"
  • `

Regular maintenance:

```bash #!/bin/bash # logstash-health-check.sh

# Check pipeline stats curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipelines.main.events'

# Check for errors in last hour journalctl -u logstash --since "1 hour ago" | grep -c "ERROR"

# Check queue health curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipipelines.main.queue'

# Check DLQ size du -sh /var/lib/logstash/dead_letter_queue/ ```

Pipeline errors usually stem from configuration issues or downstream problems. Test configuration syntax first, then check output connectivity and queue health for a systematic approach to resolution.