What's Actually Happening

Kubernetes node reports disk pressure condition when available disk space falls below the eviction threshold. Pods may be evicted, and new pods cannot be scheduled on the node.

The Error You'll See

Node condition:

```bash $ kubectl describe node node-1

Conditions: Type Status Reason Message ---- ------ ------ ------- DiskPressure True NodeHasDiskPressure kubelet has disk pressure ```

Pod evictions:

```bash $ kubectl get events

default/14m Normal NodeHasDiskPressure Node node-1 kubelet has disk pressure default/14m Normal EvictingImage Pod app-pod Pod app-pod has disk pressure ```

Node not ready:

```bash $ kubectl get nodes

NAME STATUS ROLES AGE VERSION node-1 Ready,SchedulingDisabled <none> 10d v1.28.0 ```

Why This Happens

  1. 1.Disk full - Node storage capacity exceeded
  2. 2.Large container logs - Unrotated logs filling disk
  3. 3.Old images - Unused container images not cleaned
  4. 4.Volume data - Persistent volumes consuming space
  5. 5.Eviction threshold too high - Low threshold triggered
  6. 6.No cleanup configured - Automatic cleanup not enabled

Step 1: Check Node Disk Usage

```bash # Check node disk condition kubectl describe node node-1 | grep -A 10 Conditions

# SSH into node ssh node-1

# Check disk usage df -h

# Check specific paths df -h /var/lib/docker df -h /var/lib/kubelet df -h /var/log

# Find large directories du -sh /var/lib/docker/* | sort -h du -sh /var/lib/kubelet/* | sort -h du -sh /var/log/* | sort -h

# Check inode usage df -i ```

Step 2: Clean Up Docker Resources

```bash # Check Docker disk usage docker system df

# Output: # Images: 50GB # Containers: 10GB # Local Volumes: 20GB # Build Cache: 5GB

# Remove unused images docker image prune -a

# Remove stopped containers docker container prune

# Remove unused volumes docker volume prune

# Remove build cache docker builder prune

# Full cleanup docker system prune -a --volumes

# Remove specific images docker rmi $(docker images -f "dangling=true" -q)

# Remove images older than 24 hours docker image prune -a --filter "until=24h" ```

Step 3: Clean Up Container Logs

```bash # Check container log sizes find /var/lib/docker/containers -name "*.log" -exec du -sh {} \; | sort -h

# Find large log files find /var/lib/docker/containers -name "*.log" -size +100M

# Truncate large log files truncate -s 0 /var/lib/docker/containers/*/*-json.log

# Configure log rotation in /etc/docker/daemon.json: { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } }

# Restart Docker systemctl restart docker

# Check journal logs journalctl --disk-usage journalctl --vacuum-size=100M ```

Step 4: Clean Up Kubelet Resources

```bash # Check kubelet data directory du -sh /var/lib/kubelet/*

# Remove old pod logs find /var/log/pods -name "*.log" -mtime +7 -delete

# Clean up empty pod directories find /var/lib/kubelet/pods -type d -empty -delete

# Clear kubelet cache rm -rf /var/lib/kubelet/cache

# Check for orphaned volumes ls -la /var/lib/kubelet/pods # Check for pods no longer in cluster

# Clean up orphaned volumes for pod in /var/lib/kubelet/pods/*; do pod_uid=$(basename $pod) if ! kubectl get pods -A -o jsonpath='{.items[*].metadata.uid}' | grep -q $pod_uid; then echo "Orphaned: $pod" # rm -rf $pod fi done ```

Step 5: Clean Up Old Kubernetes Objects

```bash # List completed jobs kubectl get jobs -A --field-selector status.successful=1

# Delete completed jobs kubectl delete jobs -A --field-selector status.successful=1

# Delete failed pods kubectl delete pods -A --field-selector status.phase=Failed

# Delete evicted pods kubectl delete pods -A --field-selector status.phase=Failed,status.reason=Evicted

# Delete orphaned resources kubectl get pvc -A | grep -v Bound | awk '{print $1"/"$2}' | xargs kubectl delete pvc

# Clean up completed jobs older than 1 day kubectl get jobs -A -o json | jq -r '.items[] | select(.status.completionTime != null and .status.completionTime < "'$(date -d '1 day ago' -Ins --utc | sed 's/+0000/Z/')'") | .metadata.namespace + "/" + .metadata.name' | xargs -I{} kubectl delete job {} ```

Step 6: Configure Eviction Thresholds

```bash # Check current thresholds cat /var/lib/kubelet/config.yaml | grep -A 10 eviction

# In kubelet config: evictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "10%"

evictionSoft: memory.available: "200Mi" nodefs.available: "15%" imagefs.available: "15%"

evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "1m30s" imagefs.available: "1m30s"

evictionMinimumReclaim: nodefs.available: "500Mi" imagefs.available: "2Gi"

# Adjust thresholds: # Increase available requirement to trigger earlier evictionHard: nodefs.available: "15%" # Was 10% imagefs.available: "15%"

# Restart kubelet systemctl restart kubelet ```

Step 7: Configure Image Garbage Collection

```bash # In kubelet config: imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80

# When disk usage > 85%, garbage collection runs until 80%

# For more aggressive cleanup: imageGCHighThresholdPercent: 70 imageGCLowThresholdPercent: 60

# Enable container garbage collection minimumContainerTTLDuration: "0s"

# Restart kubelet after changes systemctl restart kubelet ```

Step 8: Expand Node Storage

```bash # For VMs with expandable disks:

# Check current disk lsblk

# Expand partition (example for /dev/sda) growpart /dev/sda 1

# Resize filesystem resize2fs /dev/sda1

# For LVM: lvextend -L +50G /dev/mapper/vg-root resize2fs /dev/mapper/vg-root

# For cloud instances: # AWS: Modify volume, then expand # GCP: Resize disk, then expand partition # Azure: Expand disk, then resize in OS

# Verify new size df -h / ```

Step 9: Schedule Regular Cleanup

```bash # Create cleanup cron job cat << 'EOF' > /etc/cron.daily/kubernetes-cleanup #!/bin/bash

# Docker cleanup docker system prune -a --volumes -f --filter "until=24h"

# Remove old logs find /var/log/pods -name "*.log" -mtime +7 -delete find /var/lib/docker/containers -name "*.log" -size +100M -exec truncate -s 0 {} \;

# Clean journal journalctl --vacuum-size=500M

# Remove completed jobs kubectl delete jobs -A --field-selector status.successful=1 2>/dev/null

# Remove failed pods kubectl delete pods -A --field-selector status.phase=Failed 2>/dev/null

echo "$(date): Cleanup completed" EOF

chmod +x /etc/cron.daily/kubernetes-cleanup

# Or use Kubernetes CronJob for cleanup apiVersion: batch/v1 kind: CronJob metadata: name: node-cleanup spec: schedule: "0 2 * * *" jobTemplate: spec: template: spec: serviceAccountName: cleanup-sa containers: - name: cleanup image: bitnami/kubectl command: - /bin/sh - -c - | kubectl delete jobs -A --field-selector status.successful=1 kubectl delete pods -A --field-selector status.phase=Failed restartPolicy: OnFailure ```

Step 10: Monitor Disk Usage

```bash # Create monitoring script cat << 'EOF' > /usr/local/bin/monitor_disk.sh #!/bin/bash THRESHOLD=80

df -h | grep -E '^/dev' | while read line; do usage=$(echo $line | awk '{print $5}' | sed 's/%//') mount=$(echo $line | awk '{print $6}') if [ $usage -gt $THRESHOLD ]; then echo "WARNING: $mount at ${usage}%" # Send alert fi done

echo "=== Docker Usage ===" docker system df

echo "=== Large Log Files ===" find /var/lib/docker/containers -name "*.log" -size +100M -exec ls -lh {} \;

echo "=== Image Count ===" docker images | wc -l EOF

chmod +x /usr/local/bin/monitor_disk.sh

# Prometheus metrics: # node_filesystem_avail_bytes # node_filesystem_size_bytes # kubelet_volume_stats_available_bytes

# Alert rule: - alert: NodeDiskPressure expr: | (node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"} < 15 for: 5m labels: severity: warning annotations: summary: "Node {{ $labels.instance }} disk usage > 85%" ```

Kubernetes Node Disk Pressure Checklist

CheckCommandExpected
Disk usagedf -h< 85%
Docker imagesdocker system dfReasonable
Log filesfind -size +100MNone
Eviction thresholdkubelet configAppropriate
Garbage collectionkubelet configEnabled
Cleanup jobscron -lScheduled

Verify the Fix

```bash # After cleaning up disk space

# 1. Check disk usage df -h / // Usage < 85%

# 2. Check node condition kubectl describe node node-1 | grep -A 5 Conditions // DiskPressure: False

# 3. Check node ready kubectl get nodes // STATUS: Ready

# 4. Verify pods running kubectl get pods -A -o wide | grep node-1 // Pods running on node

# 5. Check no evictions kubectl get events --field-selector reason=Evicted // No recent evictions

# 6. Monitor disk over time watch -n 60 df -h // Stable usage ```

  • [Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)
  • [Fix Kubernetes Pod Evicted](/articles/fix-kubernetes-pod-evicted)
  • [Fix Kubernetes Node Memory Pressure](/articles/fix-kubernetes-node-memory-pressure)