What's Actually Happening
MySQL replication lag occurs when the replica (slave) cannot keep pace with the master's binary log updates. Seconds_Behind_Master increases, queries on replica return stale data, and read replicas become unreliable.
The Error You'll See
SHOW SLAVE STATUS:
```bash $ mysql -u root -p -e "SHOW SLAVE STATUS\G"
Seconds_Behind_Master: 3600 Slave_IO_Running: Yes Slave_SQL_Running: Yes Relay_Master_Log_File: mysql-bin.000123 Exec_Master_Log_Pos: 1000 Read_Master_Log_Pos: 50000 # Replica is 49,000 bytes behind, translating to 3600 seconds ```
Application impact:
Read replica query returns stale data:
SELECT * FROM orders WHERE id = 100;
# Returns old status because replica hasn't received updateMonitoring alert:
ALERT: MySQL replica-1 replication lag exceeds threshold
Current lag: 3600 seconds (1 hour)
Master position: mysql-bin.000123:50000
Replica position: mysql-bin.000123:1000Why This Happens
- 1.Slow queries on replica - Long-running queries blocking SQL thread
- 2.Network latency - Slow binary log transfer
- 3.Single-threaded replication - SQL thread processes events sequentially
- 4.Large transactions - Single transaction with many changes
- 5.Replica under load - Replica serving heavy read traffic
- 6.Binary log disk I/O - Slow disk writes on master or replica
Step 1: Check Replication Status
```bash # Check slave status details mysql -u root -p -e "SHOW SLAVE STATUS\G"
# Key metrics: # Seconds_Behind_Master - replication delay # Slave_IO_Running - binary log fetch thread # Slave_SQL_Running - event execution thread # Read_Master_Log_Pos - master position read # Exec_Master_Log_Pos - master position executed
# Calculate bytes behind mysql -u root -p -e " SELECT Read_Master_Log_Pos - Exec_Master_Log_Pos AS bytes_behind, Seconds_Behind_Master AS seconds_behind FROM (SHOW SLAVE STATUS) AS status; "
# Check master status mysql -u root -p -e "SHOW MASTER STATUS\G"
# Check processlist on replica mysql -u root -p -e " SHOW PROCESSLIST; " # Look for long-running queries blocking replication ```
Step 2: Identify Blocking Queries
```bash # Find queries running longer than replication lag mysql -u root -p -e " SELECT id, user, time, state, info FROM information_schema.processlist WHERE time > 60 AND command != 'Sleep' ORDER BY time DESC; "
# Check for queries blocking SQL thread mysql -u root -p -e " SELECT * FROM information_schema.processlist WHERE state LIKE 'Waiting for%' OR state LIKE 'Sending%'; "
# Check InnoDB lock waits mysql -u root -p -e " SELECT requesting_trx_id, blocking_trx_id, lock_type, lock_mode FROM information_schema.innodb_lock_waits; "
# If SQL thread blocked by long query: # mysql-bin log events wait for query to finish # Lag increases until query completes ```
Step 3: Kill Blocking Queries
```bash # Identify the blocking query ID mysql -u root -p -e "SHOW PROCESSLIST;" # Find the long-running query ID
# Kill the query mysql -u root -p -e "KILL QUERY 12345;"
# Or kill the connection entirely mysql -u root -p -e "KILL 12345;"
# Kill all queries running longer than threshold mysql -u root -p -e " SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE time > 300 AND command != 'Sleep' AND user != 'system user'; " | mysql -u root -p
# After killing query, replication should catch up mysql -u root -p -e "SHOW SLAVE STATUS\G" # Seconds_Behind_Master should decrease ```
Step 4: Enable Parallel Replication
```bash # MySQL 5.6+ supports multi-threaded replication # Configure parallel workers
# Stop slave temporarily mysql -u root -p -e "STOP SLAVE;"
# Configure parallel replication mysql -u root -p -e " SET GLOBAL slave_parallel_workers = 4; SET GLOBAL slave_parallel_type = 'DATABASE'; "
# Start slave mysql -u root -p -e "START SLAVE;"
# In my.cnf for permanent setting [mysqld] slave_parallel_workers = 4 slave_parallel_type = DATABASE
# MySQL 5.7+ LOGICAL_CLOCK parallelization (more efficient) mysql -u root -p -e " SET GLOBAL slave_parallel_workers = 8; SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_preserve_commit_order = 1; "
# MySQL 8.0+ enhanced parallel replication [mysqld] slave_parallel_workers = 16 slave_parallel_type = LOGICAL_CLOCK binlog_transaction_dependency_tracking = WRITESET slave_preserve_commit_order = ON ```
Step 5: Optimize Binary Log Settings
```bash # Check current binary log settings mysql -u root -p -e " SHOW VARIABLES LIKE '%binlog%'; SHOW VARIABLES LIKE '%log_bin%'; "
# Enable binary log row format (most efficient) mysql -u root -p -e " SET GLOBAL binlog_format = 'ROW'; "
# In my.cnf [mysqld] log_bin = mysql-bin binlog_format = ROW binlog_row_image = MINIMAL # Log only changed columns sync_binlog = 1 # Sync after each write (safe, slower) # Or sync_binlog = 0 for performance (risk of data loss on crash)
# Optimize relay log settings [mysqld] relay_log = relay-bin relay_log_recovery = ON # Auto-recover relay log relay_log_info_repository = TABLE # Faster than file
# Restart MySQL to apply sudo systemctl restart mysql ```
Step 6: Optimize Network Transfer
```bash # Check network between master and replica
# Test network latency ping master-server # High latency causes slow binary log transfer
# Check bandwidth iperf3 -c master-server # Low bandwidth slows log transfer
# Increase master binlog cache mysql -u root -p -e " SET GLOBAL binlog_cache_size = 128M; "
# In my.cnf [mysqld] binlog_cache_size = 128M max_binlog_size = 512M
# Use compression (MySQL 8.0+) mysql -u root -p -e " SET GLOBAL binlog_transaction_compression = ON; SET GLOBAL binlog_transaction_compression_level_zstd = 3; "
# Compressed binary logs transfer faster ```
Step 7: Reduce Replica Read Load
```bash # Check current read load on replica mysql -u root -p -e " SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Questions'; "
# Redirect some reads to another replica # Or use read/write splitting more carefully
# Add more replicas to distribute load # Configure round-robin for reads
# Throttle heavy queries on replica mysql -u root -p -e " SET GLOBAL max_execution_time = 60000; # 60 second limit "
# Schedule heavy queries during low traffic # Move analytics queries to dedicated replica
# Check if replica has enough resources mysql -u root -p -e " SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool%'; " # Buffer pool should be sized adequately ```
Step 8: Split Large Transactions
```bash # Large transactions cause replication lag # SQL thread must process entire transaction before committing
# WRONG: Single massive transaction BEGIN; INSERT INTO logs SELECT * FROM staging_logs; # Millions of rows UPDATE summary SET count = count + millions; # Long operation COMMIT; # Replica blocked until entire transaction applied
# CORRECT: Split into smaller transactions # Process in batches BEGIN; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000; COMMIT;
BEGIN; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000 OFFSET 10000; COMMIT;
# Or use auto-commit for batches INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000 OFFSET 10000; # Each INSERT is separate transaction
# In application code: for batch in data_batches: cursor.execute("INSERT INTO table VALUES (...)", batch) conn.commit() # Small transactions don't block replication ```
Step 9: Configure Semisynchronous Replication
```bash # Semisync ensures at least one replica receives transaction # Before master commits
# Install plugin (if not installed) mysql -u root -p -e " INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; "
# Enable on master mysql -u root -p -e " SET GLOBAL rpl_semi_sync_master_enabled = 1; SET GLOBAL rpl_semi_sync_master_timeout = 1000; # 1 second "
# Enable on replica mysql -u root -p -e " SET GLOBAL rpl_semi_sync_slave_enabled = 1; STOP SLAVE IO_THREAD; START SLAVE IO_THREAD; "
# In my.cnf [mysqld] # Master rpl_semi_sync_master_enabled = ON rpl_semi_sync_master_timeout = 1000
# Replica rpl_semi_sync_slave_enabled = ON
# Check status mysql -u root -p -e "SHOW STATUS LIKE 'Rpl_semi_sync%';" ```
Step 10: Monitor Replication Continuously
```bash # Create monitoring script cat << 'EOF' > /usr/local/bin/check_replication_lag.sh #!/bin/bash LAG=$(mysql -u root -p$MYSQL_PASS -N -e " SELECT Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS s; ")
if [ "$LAG" -gt 300 ]; then echo "ALERT: Replication lag is $LAG seconds"
# Get detailed status STATUS=$(mysql -u root -p$MYSQL_PASS -e "SHOW SLAVE STATUS\G")
mail -s "MySQL Replication Lag Alert" admin@company.com <<< \ "Replication lag: $LAG seconds
$STATUS" fi EOF
chmod +x /usr/local/bin/check_replication_lag.sh
# Add to cron echo "*/1 * * * * root /usr/local/bin/check_replication_lag.sh" > /etc/cron.d/mysql-replication
# Create Prometheus exporter # Use mysqld_exporter with replication metrics # Grafana dashboard for visualization
# MySQL monitoring query mysql -u root -p -e " SELECT Variable_name, Variable_value FROM performance_schema.global_status WHERE Variable_name LIKE 'Slave%'; " ```
Replication Lag Reduction Checklist
| Cause | Solution |
|---|---|
| Slow queries | Kill blocking queries |
| Single-threaded | Enable parallel workers |
| Large transactions | Split into batches |
| Network latency | Optimize bandwidth/compression |
| Heavy read load | Add replicas, throttle queries |
| Disk I/O | Tune binlog sync settings |
Verify the Fix
```bash # After applying optimizations
# 1. Check Seconds_Behind_Master decreasing mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep Seconds_Behind_Master # Should be < 10 seconds
# 2. Verify both threads running mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep -E "Slave_IO_Running|Slave_SQL_Running" # Both should be Yes
# 3. Check parallel workers active mysql -u root -p -e "SHOW VARIABLES LIKE 'slave_parallel_workers';"
# 4. Monitor for sustained low lag watch -n 5 'mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep Seconds_Behind_Master'
# 5. Verify data consistency # Compare counts on master and replica mysql -h master -e "SELECT count(*) FROM orders;" mysql -h replica -e "SELECT count(*) FROM orders;" # Should match after lag catches up
# 6. Test application reads on replica # Queries should return current data ```
Related Issues
- [Fix MySQL Replication Broken](/articles/fix-mysql-replication-broken)
- [Fix MySQL Binary Log Corrupt](/articles/fix-mysql-binary-log-corrupt)
- [Fix MySQL Slave SQL Error](/articles/fix-mysql-slave-sql-error)