What's Actually Happening

Logs are not being ingested into Loki. Promtail or other agents cannot send logs to Loki, or Loki is rejecting log entries.

The Error You'll See

```bash $ kubectl logs -n monitoring deploy/promtail

Error: failed to push logs: 429 Too Many Requests ```

Ingestion error:

bash
Error: entry out of order for stream

Storage error:

bash
Error: failed to write chunk: storage write failed

Connection refused:

bash
Error: Post "http://loki:3100/loki/api/v1/push": dial tcp: connection refused

Why This Happens

  1. 1.Rate limiting - Ingestion rate exceeds limits
  2. 2.Out of order logs - Timestamps not in order for stream
  3. 3.Label cardinality - Too many unique label combinations
  4. 4.Storage issues - Cannot write to storage backend
  5. 5.Promtail misconfig - Wrong Loki URL or configuration
  6. 6.Resource limits - Loki memory/CPU exhausted

Step 1: Check Promtail Status

```bash # Check Promtail pods: kubectl get pods -n monitoring -l app=promtail

# View Promtail logs: kubectl logs -n monitoring -l app=promtail

# Check recent errors: kubectl logs -n monitoring -l app=promtail --tail=100 | grep -i error

# Describe Promtail pod: kubectl describe pods -n monitoring -l app=promtail

# Check Promtail config: kubectl get configmap -n monitoring promtail -o yaml

# Check Promtail resources: kubectl top pods -n monitoring -l app=promtail

# Check Promtail metrics: kubectl port-forward -n monitoring svc/promtail 9080:9080 & curl http://localhost:9080/metrics | grep loki_

# Check client position file: kubectl exec -n monitoring -l app=promtail -- cat /tmp/positions.yaml ```

Step 2: Verify Loki Connectivity

```bash # Check Loki service: kubectl get svc -n monitoring | grep loki

# Test Loki endpoint: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/ready

# Check Loki health: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/ready

# Test push endpoint: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -v http://loki:3100/loki/api/v1/push

# Check Loki URL in Promtail: kubectl get configmap -n monitoring promtail -o yaml | grep url

# Verify URL matches Loki service: # clients: # - url: http://loki:3100/loki/api/v1/push

# Test from Promtail pod: kubectl exec -n monitoring -l app=promtail -- wget -qO- http://loki:3100/ready

# Check network policies: kubectl get networkpolicy -n monitoring ```

Step 3: Check Loki Status

```bash # Check Loki pods: kubectl get pods -n monitoring -l app=loki

# View Loki logs: kubectl logs -n monitoring -l app=loki

# Check recent errors: kubectl logs -n monitoring -l app=loki --tail=100 | grep -i "error|warn"

# Describe Loki pod: kubectl describe pods -n monitoring -l app=loki

# Check Loki resources: kubectl top pods -n monitoring -l app=loki

# Check Loki config: kubectl get configmap -n monitoring loki -o yaml

# Check Loki build info: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/loki/api/v1/status/buildinfo | jq .

# Check Loki runtime config: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/loki/api/v1/status/runtime_config ```

Step 4: Check Ingestion Limits

```bash # Check Loki config limits: kubectl get configmap -n monitoring loki -o yaml | grep -A30 limits_config

# Common limits: # ingestion_rate_mb: 10 # ingestion_burst_size_mb: 20 # per_stream_rate_limit: 5MB # max_streams_per_user: 10000 # max_entries_limit_per_query: 5000 # reject_old_samples: true # reject_old_samples_max_age: 168h

# Check rate limit errors in logs: kubectl logs -n monitoring -l app=loki | grep -i "rate limit|429"

# Increase limits: # In loki config: limits_config: ingestion_rate_mb: 50 ingestion_burst_size_mb: 100 per_stream_rate_limit: 20MB max_streams_per_user: 50000

# Check per-tenant limits: kubectl get configmap -n monitoring loki -o yaml | grep -A10 overrides

# Check runtime overrides: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/loki/api/v1/status/runtime_config | jq .overrides ```

Step 5: Fix Out of Order Logs

```bash # Out of order error in logs: kubectl logs -n monitoring -l app=loki | grep "out of order"

# Loki requires logs within a stream to be ordered by timestamp

# Enable out-of-order support: # In loki config: limits_config: reject_old_samples: false reject_old_samples_max_age: 336h

compactor: out_of_order_time_window: 24h

# Or fix in Promtail: # Ensure timestamp is extracted correctly: scrape_configs: - job_name: kubernetes-pods pipeline_stages: - docker: {} - timestamp: source: time format: RFC3339Nano

# Check if logs have correct timestamps: kubectl logs -n default my-pod --timestamps

# Adjust timestamp extraction: - regex: expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})' - timestamp: source: time format: "2006-01-02 15:04:05" ```

Step 6: Check Label Cardinality

```bash # Check label cardinality: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/loki/api/v1/label | jq .

# Check high cardinality labels: kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki:3100/loki/api/v1/labels | jq .

# Common problematic labels: # - pod_id (unique per pod) # - container_id (unique per container) # - request_id (unique per request) # - timestamp (changes every scrape)

# Limit labels in Promtail: # pipeline_stages: # - labels: # level: # app: # # Only include stable labels

# Drop high cardinality labels: - match: selector: '{job="kubernetes-pods"}' stages: - labeldrop: - pod_id - container_id

# Check stream count: kubectl logs -n monitoring -l app=loki | grep -i "stream|series" ```

Step 7: Check Storage Backend

```bash # Check storage config: kubectl get configmap -n monitoring loki -o yaml | grep -A30 storage_config

# For S3 storage: storage_config: aws: s3: s3://loki-bucket region: us-east-1

# Check S3 access: kubectl run test-aws --image=amazon/aws-cli --rm -it --restart=Never \ -- aws s3 ls s3://loki-bucket

# Check IAM permissions: kubectl get secret -n monitoring aws-credentials

# For GCS storage: storage_config: gcs: bucket_name: loki-bucket

# For Azure storage: storage_config: azure: container_name: loki

# Check storage metrics: kubectl port-forward -n monitoring svc/loki 3100:3100 & curl http://localhost:3100/metrics | grep loki_ingester_

# Check chunk writes: kubectl logs -n monitoring -l app=loki | grep -i "chunk|flush" ```

Step 8: Check Promtail Scraping

```bash # Check Promtail scraping config: kubectl get configmap -n monitoring promtail -o yaml | grep -A50 scrape_configs

# Check discovered targets: kubectl port-forward -n monitoring svc/promtail 9080:9080 & curl http://localhost:9080/service-discovery

# Check targets page: curl http://localhost:9080/targets

# Verify pod logs being scraped: kubectl exec -n monitoring -l app=promtail -- cat /tmp/positions.yaml

# Check positions file for progress: kubectl exec -n monitoring -l app=promtail -- cat /tmp/positions.yaml | head -20

# Reset positions if stuck: kubectl exec -n monitoring -l app=promtail -- rm /tmp/positions.yaml kubectl rollout restart daemonset/promtail -n monitoring

# Check volume mounts: kubectl describe ds promtail -n monitoring | grep -A20 "Volumes:" ```

Step 9: Check Resource Limits

```bash # Check Loki memory: kubectl top pods -n monitoring -l app=loki

# Check Loki memory limits: kubectl get deploy loki -n monitoring -o jsonpath='{.spec.template.spec.containers[0].resources}'

# Increase memory if needed: kubectl set resources deploy/loki -n monitoring \ --limits=memory=4Gi,cpu=2 \ --requests=memory=2Gi,cpu=1

# Check Promtail resources: kubectl top pods -n monitoring -l app=promtail

# Increase Promtail limits: kubectl set resources daemonset/promtail -n monitoring \ --limits=memory=512Mi,cpu=500m \ --requests=memory=256Mi,cpu=100m

# Check for OOMKilled: kubectl describe pods -n monitoring -l app=loki | grep -i oom

# Check memory pressure: kubectl logs -n monitoring -l app=loki | grep -i "memory|oom"

# Increase ingester memory: # ingester: # chunk_idle_period: 5m # chunk_retain_period: 30s # max_transfer_retries: 0 ```

Step 10: Loki Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-loki.sh #!/bin/bash

NS=${1:-"monitoring"}

echo "=== Loki Pods ===" kubectl get pods -n $NS -l app=loki

echo "" echo "=== Promtail Pods ===" kubectl get pods -n $NS -l app=promtail

echo "" echo "=== Loki Health ===" kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki.$NS:3100/ready 2>/dev/null || echo "Loki not healthy"

echo "" echo "=== Loki Metrics ===" kubectl logs -n $NS -l app=loki --tail=20 | grep -i "error|warn" | head -10

echo "" echo "=== Promtail Errors ===" kubectl logs -n $NS -l app=promtail --tail=20 | grep -i "error|fail" | head -10

echo "" echo "=== Ingestion Rate ===" kubectl run test-curl --image=curlimages/curl --rm -it --restart=Never \ -- curl -s http://loki.$NS:3100/metrics 2>/dev/null | grep loki_ingester_chunks_flushed

echo "" echo "=== Storage Status ===" kubectl get configmap -n $NS loki -o yaml | grep -A5 "storage_config:"

echo "" echo "=== Rate Limits ===" kubectl get configmap -n $NS loki -o yaml | grep -A10 "limits_config:" | head -15

echo "" echo "=== Recommendations ===" echo "1. Check Loki URL in Promtail configuration" echo "2. Verify ingestion rate limits are adequate" echo "3. Review label cardinality for high cardinality labels" echo "4. Check storage backend connectivity" echo "5. Ensure timestamps are correctly extracted" echo "6. Increase resource limits if OOMKilled" echo "7. Check network policies allow traffic" EOF

chmod +x /usr/local/bin/check-loki.sh

# Usage: /usr/local/bin/check-loki.sh monitoring ```

Loki Ingestion Checklist

CheckExpected
Loki runningPods ready
Promtail connectedCan push to Loki
Rate limitsNot exceeded
LabelsLow cardinality
TimestampsCorrectly ordered
StorageWritable backend
ResourcesWithin limits

Verify the Fix

```bash # After fixing Loki ingestion issues

# 1. Check Loki ready kubectl get pods -n monitoring -l app=loki // All pods Running

# 2. Test ingestion curl -H "Content-Type: application/json" -X POST http://loki:3100/loki/api/v1/push \ -d '{"streams":[{"stream":{"test":"yes"},"values":[["'$(date +%s)000000000'","test message"]]}]}' // 204 No Content

# 3. Query logs curl http://loki:3100/loki/api/v1/query?query='{app="myapp"}' // Returns log entries

# 4. Check Promtail sending kubectl logs -n monitoring -l app=promtail --tail=10 // No errors, logs pushed

# 5. Verify in Grafana # Explore -> Loki -> query logs // Logs visible

# 6. Check ingestion metrics curl http://loki:3100/metrics | grep loki_ingester // Metrics show ingestion ```

  • [Fix Prometheus Target Down](/articles/fix-prometheus-target-down)
  • [Fix Grafana Datasource Error](/articles/fix-grafana-datasource-error)
  • [Fix Telegraf Output Plugin Buffer Full](/articles/fix-telegraf-output-plugin-buffer-full)