Introduction

Database table corrupted when disk error or unclean shutdown. This guide provides step-by-step diagnosis and resolution.

Symptoms

Typical error output:

bash
ERROR: invalid page header in block 1234 of relation "public.users"
HINT: Run REINDEX or VACUUM FULL to recover
File: /var/lib/postgresql/data/base/16384/12345

Common Causes

  1. 1.Lock conflict or deadlock between concurrent transactions
  2. 2.Resource exhausted (connections, disk, memory)
  3. 3.Replication or failover configuration issue
  4. 4.Data corruption or constraint violation

Step-by-Step Fix

Step 1: Check Current State

bash
# Check database status
systemctl status postgresql
# View error logs
tail -f /var/log/postgresql/postgresql.log
# Check active connections
psql -c "SELECT count(*) FROM pg_stat_activity;"

Step 2: Identify Root Cause

bash
# Check active queries
SELECT * FROM pg_stat_activity WHERE state = 'active';
# View locks
SELECT * FROM pg_locks;
# Check replication status
SELECT * FROM pg_stat_replication;

Step 3: Apply Primary Fix

```bash # Primary fix: Check and resolve issue # View blocking queries SELECT * FROM pg_stat_activity WHERE state = 'active';

# Kill blocking query if needed SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query = 'problem_query';

# Check for locks SELECT * FROM pg_locks WHERE NOT granted; ```

Step 4: Apply Alternative Fix

```bash # Alternative: Check logs and restart # View detailed error log tail -100 /var/log/postgresql/postgresql.log

# Check database size SELECT pg_size_pretty(pg_database_size('mydb'));

# Restart database if needed systemctl restart postgresql ```

Step 5: Verify the Fix

bash
psql -c "SELECT version();"
# Should return PostgreSQL version
psql -c "SELECT count(*) FROM pg_stat_activity;"
# Check connection count

Common Pitfalls

  • Not monitoring lock wait times
  • Using long-running transactions
  • Ignoring replication lag
  • Not backing up before schema changes

Best Practices

  • Monitor database metrics continuously
  • Set appropriate lock timeouts
  • Regular backup and restore testing
  • Keep statistics updated
  • Database Deadlock
  • Connection Pool Exhausted
  • Replication Failed
  • Query Timeout