What's Actually Happening
Kubernetes node reports disk pressure condition when available disk space falls below the eviction threshold. Pods may be evicted, and new pods cannot be scheduled on the node.
The Error You'll See
Node condition:
```bash $ kubectl describe node node-1
Conditions: Type Status Reason Message ---- ------ ------ ------- DiskPressure True NodeHasDiskPressure kubelet has disk pressure ```
Pod evictions:
```bash $ kubectl get events
default/14m Normal NodeHasDiskPressure Node node-1 kubelet has disk pressure default/14m Normal EvictingImage Pod app-pod Pod app-pod has disk pressure ```
Node not ready:
```bash $ kubectl get nodes
NAME STATUS ROLES AGE VERSION node-1 Ready,SchedulingDisabled <none> 10d v1.28.0 ```
Why This Happens
- 1.Disk full - Node storage capacity exceeded
- 2.Large container logs - Unrotated logs filling disk
- 3.Old images - Unused container images not cleaned
- 4.Volume data - Persistent volumes consuming space
- 5.Eviction threshold too high - Low threshold triggered
- 6.No cleanup configured - Automatic cleanup not enabled
Step 1: Check Node Disk Usage
```bash # Check node disk condition kubectl describe node node-1 | grep -A 10 Conditions
# SSH into node ssh node-1
# Check disk usage df -h
# Check specific paths df -h /var/lib/docker df -h /var/lib/kubelet df -h /var/log
# Find large directories du -sh /var/lib/docker/* | sort -h du -sh /var/lib/kubelet/* | sort -h du -sh /var/log/* | sort -h
# Check inode usage df -i ```
Step 2: Clean Up Docker Resources
```bash # Check Docker disk usage docker system df
# Output: # Images: 50GB # Containers: 10GB # Local Volumes: 20GB # Build Cache: 5GB
# Remove unused images docker image prune -a
# Remove stopped containers docker container prune
# Remove unused volumes docker volume prune
# Remove build cache docker builder prune
# Full cleanup docker system prune -a --volumes
# Remove specific images docker rmi $(docker images -f "dangling=true" -q)
# Remove images older than 24 hours docker image prune -a --filter "until=24h" ```
Step 3: Clean Up Container Logs
```bash # Check container log sizes find /var/lib/docker/containers -name "*.log" -exec du -sh {} \; | sort -h
# Find large log files find /var/lib/docker/containers -name "*.log" -size +100M
# Truncate large log files truncate -s 0 /var/lib/docker/containers/*/*-json.log
# Configure log rotation in /etc/docker/daemon.json: { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } }
# Restart Docker systemctl restart docker
# Check journal logs journalctl --disk-usage journalctl --vacuum-size=100M ```
Step 4: Clean Up Kubelet Resources
```bash # Check kubelet data directory du -sh /var/lib/kubelet/*
# Remove old pod logs find /var/log/pods -name "*.log" -mtime +7 -delete
# Clean up empty pod directories find /var/lib/kubelet/pods -type d -empty -delete
# Clear kubelet cache rm -rf /var/lib/kubelet/cache
# Check for orphaned volumes ls -la /var/lib/kubelet/pods # Check for pods no longer in cluster
# Clean up orphaned volumes for pod in /var/lib/kubelet/pods/*; do pod_uid=$(basename $pod) if ! kubectl get pods -A -o jsonpath='{.items[*].metadata.uid}' | grep -q $pod_uid; then echo "Orphaned: $pod" # rm -rf $pod fi done ```
Step 5: Clean Up Old Kubernetes Objects
```bash # List completed jobs kubectl get jobs -A --field-selector status.successful=1
# Delete completed jobs kubectl delete jobs -A --field-selector status.successful=1
# Delete failed pods kubectl delete pods -A --field-selector status.phase=Failed
# Delete evicted pods kubectl delete pods -A --field-selector status.phase=Failed,status.reason=Evicted
# Delete orphaned resources kubectl get pvc -A | grep -v Bound | awk '{print $1"/"$2}' | xargs kubectl delete pvc
# Clean up completed jobs older than 1 day kubectl get jobs -A -o json | jq -r '.items[] | select(.status.completionTime != null and .status.completionTime < "'$(date -d '1 day ago' -Ins --utc | sed 's/+0000/Z/')'") | .metadata.namespace + "/" + .metadata.name' | xargs -I{} kubectl delete job {} ```
Step 6: Configure Eviction Thresholds
```bash # Check current thresholds cat /var/lib/kubelet/config.yaml | grep -A 10 eviction
# In kubelet config: evictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "10%"
evictionSoft: memory.available: "200Mi" nodefs.available: "15%" imagefs.available: "15%"
evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "1m30s" imagefs.available: "1m30s"
evictionMinimumReclaim: nodefs.available: "500Mi" imagefs.available: "2Gi"
# Adjust thresholds: # Increase available requirement to trigger earlier evictionHard: nodefs.available: "15%" # Was 10% imagefs.available: "15%"
# Restart kubelet systemctl restart kubelet ```
Step 7: Configure Image Garbage Collection
```bash # In kubelet config: imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80
# When disk usage > 85%, garbage collection runs until 80%
# For more aggressive cleanup: imageGCHighThresholdPercent: 70 imageGCLowThresholdPercent: 60
# Enable container garbage collection minimumContainerTTLDuration: "0s"
# Restart kubelet after changes systemctl restart kubelet ```
Step 8: Expand Node Storage
```bash # For VMs with expandable disks:
# Check current disk lsblk
# Expand partition (example for /dev/sda) growpart /dev/sda 1
# Resize filesystem resize2fs /dev/sda1
# For LVM: lvextend -L +50G /dev/mapper/vg-root resize2fs /dev/mapper/vg-root
# For cloud instances: # AWS: Modify volume, then expand # GCP: Resize disk, then expand partition # Azure: Expand disk, then resize in OS
# Verify new size df -h / ```
Step 9: Schedule Regular Cleanup
```bash # Create cleanup cron job cat << 'EOF' > /etc/cron.daily/kubernetes-cleanup #!/bin/bash
# Docker cleanup docker system prune -a --volumes -f --filter "until=24h"
# Remove old logs find /var/log/pods -name "*.log" -mtime +7 -delete find /var/lib/docker/containers -name "*.log" -size +100M -exec truncate -s 0 {} \;
# Clean journal journalctl --vacuum-size=500M
# Remove completed jobs kubectl delete jobs -A --field-selector status.successful=1 2>/dev/null
# Remove failed pods kubectl delete pods -A --field-selector status.phase=Failed 2>/dev/null
echo "$(date): Cleanup completed" EOF
chmod +x /etc/cron.daily/kubernetes-cleanup
# Or use Kubernetes CronJob for cleanup apiVersion: batch/v1 kind: CronJob metadata: name: node-cleanup spec: schedule: "0 2 * * *" jobTemplate: spec: template: spec: serviceAccountName: cleanup-sa containers: - name: cleanup image: bitnami/kubectl command: - /bin/sh - -c - | kubectl delete jobs -A --field-selector status.successful=1 kubectl delete pods -A --field-selector status.phase=Failed restartPolicy: OnFailure ```
Step 10: Monitor Disk Usage
```bash # Create monitoring script cat << 'EOF' > /usr/local/bin/monitor_disk.sh #!/bin/bash THRESHOLD=80
df -h | grep -E '^/dev' | while read line; do usage=$(echo $line | awk '{print $5}' | sed 's/%//') mount=$(echo $line | awk '{print $6}') if [ $usage -gt $THRESHOLD ]; then echo "WARNING: $mount at ${usage}%" # Send alert fi done
echo "=== Docker Usage ===" docker system df
echo "=== Large Log Files ===" find /var/lib/docker/containers -name "*.log" -size +100M -exec ls -lh {} \;
echo "=== Image Count ===" docker images | wc -l EOF
chmod +x /usr/local/bin/monitor_disk.sh
# Prometheus metrics: # node_filesystem_avail_bytes # node_filesystem_size_bytes # kubelet_volume_stats_available_bytes
# Alert rule: - alert: NodeDiskPressure expr: | (node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"} < 15 for: 5m labels: severity: warning annotations: summary: "Node {{ $labels.instance }} disk usage > 85%" ```
Kubernetes Node Disk Pressure Checklist
| Check | Command | Expected |
|---|---|---|
| Disk usage | df -h | < 85% |
| Docker images | docker system df | Reasonable |
| Log files | find -size +100M | None |
| Eviction threshold | kubelet config | Appropriate |
| Garbage collection | kubelet config | Enabled |
| Cleanup jobs | cron -l | Scheduled |
Verify the Fix
```bash # After cleaning up disk space
# 1. Check disk usage df -h / // Usage < 85%
# 2. Check node condition kubectl describe node node-1 | grep -A 5 Conditions // DiskPressure: False
# 3. Check node ready kubectl get nodes // STATUS: Ready
# 4. Verify pods running kubectl get pods -A -o wide | grep node-1 // Pods running on node
# 5. Check no evictions kubectl get events --field-selector reason=Evicted // No recent evictions
# 6. Monitor disk over time watch -n 60 df -h // Stable usage ```
Related Issues
- [Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)
- [Fix Kubernetes Pod Evicted](/articles/fix-kubernetes-pod-evicted)
- [Fix Kubernetes Node Memory Pressure](/articles/fix-kubernetes-node-memory-pressure)