What's Actually Happening
Temporal workflow executions are stuck and not making progress. Workflows remain in Running state without advancing.
The Error You'll See
```bash $ temporal workflow list
WORKFLOW ID RUN ID STATUS my-workflow-123 run-xxx Running (stuck for 2h) ```
No activity execution:
# Workflow shows no activity heartbeats
# Activities not being picked up by workersTask queue empty:
```bash $ temporal task-queue describe --task-queue my-queue
No pollers active ```
Workflow timeout:
Error: workflow execution timeout exceededWhy This Happens
- 1.Worker not running - No workers polling the task queue
- 2.Task queue mismatch - Worker listening on wrong queue
- 3.Activity timeout - Activity exceeds timeout without heartbeat
- 4.Workflow blocked - Activity or timer never completing
- 5.Resource limits - Worker overwhelmed with tasks
- 6.Network partition - Worker disconnected from server
Step 1: Check Workflow Status
```bash # List workflows: temporal workflow list
# Describe specific workflow: temporal workflow describe --workflow-id my-workflow-123
# Get workflow history: temporal workflow show --workflow-id my-workflow-123
# View pending activities: temporal workflow describe --workflow-id my-workflow-123 | grep -A10 "pendingActivities"
# Check workflow state: temporal workflow describe --workflow-id my-workflow-123 --output json | jq '.workflowExecutionInfo.status'
# View workflow events: temporal workflow show --workflow-id my-workflow-123 --output json | jq '.history.events'
# Check for errors: temporal workflow describe --workflow-id my-workflow-123 | grep -i "error|failure" ```
Step 2: Check Task Queue Status
```bash # Describe task queue: temporal task-queue describe --task-queue my-task-queue
# List pollers: temporal task-queue describe --task-queue my-task-queue | grep -A10 "pollers"
# Check poller status: # LastAccessTime shows when poller was last seen
# If no pollers: # Workers not started or wrong task queue
# Check task queue backlog: temporal task-queue describe --task-queue my-task-queue | grep -i "backlog|task"
# Multiple task queues: temporal task-queue list
# Task queue with namespace: temporal task-queue describe --task-queue my-task-queue --namespace production
# Check if poller is reaching server: # Worker logs should show poll attempts ```
Step 3: Check Worker Status
```bash # Check worker processes: ps aux | grep temporal
# Check worker logs: # If using Docker: docker logs temporal-worker
# Kubernetes: kubectl logs -l app=temporal-worker
# Worker startup logs: # Should show: "Started Worker with task queue: my-task-queue"
# Worker configuration: # Verify task queue matches workflow
# Check worker connection: # Worker must connect to Temporal server curl http://temporal-server:7233
# Worker metrics: curl http://worker-metrics:9090/metrics | grep temporal
# Restart worker: kubectl rollout restart deploy/temporal-worker
# Scale workers: kubectl scale deploy/temporal-worker --replicas=3 ```
Step 4: Fix Activity Issues
```bash # Check pending activities: temporal workflow describe --workflow-id my-workflow-123
# Activity stuck causes:
# 1. No heartbeat: # Activity must heartbeat for long-running activities # activity.Heartbeat(ctx)
# 2. Timeout exceeded: # Increase activity timeout: # activityOptions := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{ # StartToCloseTimeout: time.Hour, # HeartbeatTimeout: time.Minute * 5, # })
# 3. Activity not found: # Ensure activity is registered: # worker.RegisterActivity(MyActivity)
# 4. Worker crashed: # Restart worker, workflow will retry
# 5. Network partition: # Check worker connectivity
# Cancel stuck activity: temporal workflow signal --workflow-id my-workflow-123 --name cancel-activity
# Reset workflow to retry: temporal workflow reset --workflow-id my-workflow-123 --event-id 10 --reason "stuck activity" ```
Step 5: Handle Workflow Blocking
```bash # Workflow can be blocked waiting for:
# 1. Activity completion: # Check activity status in history
# 2. Timer: # Check for pending timers # Timer is stuck if server time wrong
# 3. Signal: # Workflow waiting for signal: temporal workflow signal --workflow-id my-workflow-123 --name my-signal
# 4. Child workflow: # Check if child workflow is stuck
# 5. External event: # Workflow blocked on external condition
# View blocking point: temporal workflow show --workflow-id my-workflow-123 | grep -A5 "waiting"
# Skip blocked operation: # Use workflow reset to skip problematic event
# Reset to specific event: temporal workflow reset --workflow-id my-workflow-123 --event-id 15 --reason "skip blocked activity" ```
Step 6: Check Temporal Server
```bash # Check Temporal server pods: kubectl get pods -l app=temporal
# Check frontend service: kubectl get svc temporal-frontend
# Check server logs: kubectl logs -l app=temporal-server
# Server health: curl http://temporal-server:7233/health
# Check server metrics: curl http://temporal-server:9090/metrics
# Verify server connectivity: temporal operator cluster health
# Check cluster members: temporal operator cluster describe
# Server configuration: kubectl get configmap temporal-config -o yaml
# Check database: # Temporal uses PostgreSQL or MySQL kubectl exec -it postgres-pod -- psql -U temporal -c "SELECT 1" ```
Step 7: Check Visibility Store
```bash # Temporal visibility store for queries: temporal operator cluster describe | grep visibility
# Advanced visibility (Elasticsearch): curl http://elasticsearch:9200/_cluster/health
# Check ES index: curl http://elasticsearch:9200/temporal_visibility_v1/_search?size=1
# Rebuild visibility: temporal operator cluster update-search-attributes
# List search attributes: temporal operator cluster get-search-attributes
# If visibility issues: # Workflows may not show in list but are running
# Check visibility latency: temporal workflow list --query "WorkflowId = 'my-workflow*'" ```
Step 8: Fix Retry Configuration
```bash # Check retry policy: # Activities retry by default with exponential backoff
# Workflow retry policy: # retry_policy := &temporal.RetryPolicy{ # InitialInterval: time.Second, # BackoffCoefficient: 2.0, # MaximumInterval: time.Minute, # MaximumAttempts: 5, # }
# Activity retry too aggressive: # Increase intervals if worker overloaded
# Retry exhausted: temporal workflow describe --workflow-id my-workflow-123 | grep -i "retry|attempt"
# Reset retry count: temporal workflow reset --workflow-id my-workflow-123 --event-id 10 --reason "reset retry"
# Disable retry for debugging: # WithActivityOptions with RetryPolicy: MaximumAttempts: 1
# Check retry state: temporal workflow show --workflow-id my-workflow-123 | grep -i "retry" ```
Step 9: Check Resource Limits
```bash # Check worker resources: kubectl top pods -l app=temporal-worker
# Check server resources: kubectl top pods -l app=temporal-server
# Increase worker resources: kubectl set resources deploy/temporal-worker \ --limits=cpu=4,memory=8Gi \ --requests=cpu=2,memory=4Gi
# Worker max concurrent tasks: # workerOptions := worker.Options{ # MaxConcurrentWorkflowTaskExecutionSize: 1000, # MaxConcurrentActivityExecutionSize: 1000, # }
# Rate limiting: # workerOptions := worker.Options{ # WorkerActivitiesPerSecond: 100, # }
# Check for OOMKilled: kubectl describe pods -l app=temporal-worker | grep -i oom
# Scale horizontally: kubectl scale deploy/temporal-worker --replicas=5 ```
Step 10: Temporal Verification Script
```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-temporal-workflow.sh #!/bin/bash
WORKFLOW=${1:-"my-workflow-123"}
echo "=== Workflow Status ===" temporal workflow describe --workflow-id $WORKFLOW 2>&1 | head -30
echo "" echo "=== Workflow History ===" temporal workflow show --workflow-id $WORKFLOW 2>&1 | tail -30
echo "" echo "=== Pending Activities ===" temporal workflow describe --workflow-id $WORKFLOW 2>&1 | grep -A10 "pendingActivities"
echo "" echo "=== Task Queue Status ===" temporal task-queue describe --task-queue my-task-queue 2>&1 | head -20
echo "" echo "=== Active Pollers ===" temporal task-queue describe --task-queue my-task-queue 2>&1 | grep -A20 "pollers"
echo "" echo "=== Server Health ===" temporal operator cluster health 2>&1 || echo "Cannot reach server"
echo "" echo "=== Worker Logs ===" kubectl logs -l app=temporal-worker --tail=20 2>/dev/null | grep -i "error|start|poll"
echo "" echo "=== Worker Pods ===" kubectl get pods -l app=temporal-worker 2>/dev/null || echo "No worker pods found"
echo "" echo "=== Recommendations ===" echo "1. Ensure workers are running and polling correct task queue" echo "2. Check activity timeouts and heartbeats" echo "3. Review workflow history for blocking operations" echo "4. Verify Temporal server is healthy" echo "5. Check worker resource limits" echo "6. Reset workflow if stuck irrecoverably" echo "7. Signal workflow to unblock waiting operations" EOF
chmod +x /usr/local/bin/check-temporal-workflow.sh
# Usage: /usr/local/bin/check-temporal-workflow.sh my-workflow-123 ```
Temporal Workflow Checklist
| Check | Expected |
|---|---|
| Worker running | Polling task queue |
| Task queue | Pollers active |
| Activities | Heartbeating |
| Server healthy | Cluster reachable |
| No timeouts | Within limits |
| Visibility | ES/SQL working |
| Resources | Within limits |
Verify the Fix
```bash # After fixing Temporal workflow issues
# 1. Check workflow status temporal workflow describe --workflow-id my-workflow-123 // Status: Running or Completed
# 2. Verify task queue temporal task-queue describe --task-queue my-task-queue // Pollers active
# 3. Check workflow progress temporal workflow show --workflow-id my-workflow-123 // Events advancing
# 4. Worker logs kubectl logs -l app=temporal-worker --tail=10 // Processing tasks
# 5. Test new workflow temporal workflow start --type MyWorkflow --task-queue my-task-queue --workflow-id test-123 // Completes successfully
# 6. Monitor metrics curl http://worker-metrics:9090/metrics | grep temporal_worker_task_slots_available // Slots available ```
Related Issues
- [Fix Kafka Consumer Lag High](/articles/fix-kafka-consumer-lag-high)
- [Fix RabbitMQ Consumer Not Receiving Messages](/articles/fix-rabbitmq-consumer-not-receiving-messages)
- [Fix Redis Queue Processing Delayed](/articles/fix-redis-queue-processing-delayed)