What's Actually Happening

MySQL replication lag occurs when the replica (slave) cannot keep pace with the master's binary log updates. Seconds_Behind_Master increases, queries on replica return stale data, and read replicas become unreliable.

The Error You'll See

SHOW SLAVE STATUS:

```bash $ mysql -u root -p -e "SHOW SLAVE STATUS\G"

Seconds_Behind_Master: 3600 Slave_IO_Running: Yes Slave_SQL_Running: Yes Relay_Master_Log_File: mysql-bin.000123 Exec_Master_Log_Pos: 1000 Read_Master_Log_Pos: 50000 # Replica is 49,000 bytes behind, translating to 3600 seconds ```

Application impact:

bash
Read replica query returns stale data:
SELECT * FROM orders WHERE id = 100;
# Returns old status because replica hasn't received update

Monitoring alert:

bash
ALERT: MySQL replica-1 replication lag exceeds threshold
Current lag: 3600 seconds (1 hour)
Master position: mysql-bin.000123:50000
Replica position: mysql-bin.000123:1000

Why This Happens

  1. 1.Slow queries on replica - Long-running queries blocking SQL thread
  2. 2.Network latency - Slow binary log transfer
  3. 3.Single-threaded replication - SQL thread processes events sequentially
  4. 4.Large transactions - Single transaction with many changes
  5. 5.Replica under load - Replica serving heavy read traffic
  6. 6.Binary log disk I/O - Slow disk writes on master or replica

Step 1: Check Replication Status

```bash # Check slave status details mysql -u root -p -e "SHOW SLAVE STATUS\G"

# Key metrics: # Seconds_Behind_Master - replication delay # Slave_IO_Running - binary log fetch thread # Slave_SQL_Running - event execution thread # Read_Master_Log_Pos - master position read # Exec_Master_Log_Pos - master position executed

# Calculate bytes behind mysql -u root -p -e " SELECT Read_Master_Log_Pos - Exec_Master_Log_Pos AS bytes_behind, Seconds_Behind_Master AS seconds_behind FROM (SHOW SLAVE STATUS) AS status; "

# Check master status mysql -u root -p -e "SHOW MASTER STATUS\G"

# Check processlist on replica mysql -u root -p -e " SHOW PROCESSLIST; " # Look for long-running queries blocking replication ```

Step 2: Identify Blocking Queries

```bash # Find queries running longer than replication lag mysql -u root -p -e " SELECT id, user, time, state, info FROM information_schema.processlist WHERE time > 60 AND command != 'Sleep' ORDER BY time DESC; "

# Check for queries blocking SQL thread mysql -u root -p -e " SELECT * FROM information_schema.processlist WHERE state LIKE 'Waiting for%' OR state LIKE 'Sending%'; "

# Check InnoDB lock waits mysql -u root -p -e " SELECT requesting_trx_id, blocking_trx_id, lock_type, lock_mode FROM information_schema.innodb_lock_waits; "

# If SQL thread blocked by long query: # mysql-bin log events wait for query to finish # Lag increases until query completes ```

Step 3: Kill Blocking Queries

```bash # Identify the blocking query ID mysql -u root -p -e "SHOW PROCESSLIST;" # Find the long-running query ID

# Kill the query mysql -u root -p -e "KILL QUERY 12345;"

# Or kill the connection entirely mysql -u root -p -e "KILL 12345;"

# Kill all queries running longer than threshold mysql -u root -p -e " SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE time > 300 AND command != 'Sleep' AND user != 'system user'; " | mysql -u root -p

# After killing query, replication should catch up mysql -u root -p -e "SHOW SLAVE STATUS\G" # Seconds_Behind_Master should decrease ```

Step 4: Enable Parallel Replication

```bash # MySQL 5.6+ supports multi-threaded replication # Configure parallel workers

# Stop slave temporarily mysql -u root -p -e "STOP SLAVE;"

# Configure parallel replication mysql -u root -p -e " SET GLOBAL slave_parallel_workers = 4; SET GLOBAL slave_parallel_type = 'DATABASE'; "

# Start slave mysql -u root -p -e "START SLAVE;"

# In my.cnf for permanent setting [mysqld] slave_parallel_workers = 4 slave_parallel_type = DATABASE

# MySQL 5.7+ LOGICAL_CLOCK parallelization (more efficient) mysql -u root -p -e " SET GLOBAL slave_parallel_workers = 8; SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_preserve_commit_order = 1; "

# MySQL 8.0+ enhanced parallel replication [mysqld] slave_parallel_workers = 16 slave_parallel_type = LOGICAL_CLOCK binlog_transaction_dependency_tracking = WRITESET slave_preserve_commit_order = ON ```

Step 5: Optimize Binary Log Settings

```bash # Check current binary log settings mysql -u root -p -e " SHOW VARIABLES LIKE '%binlog%'; SHOW VARIABLES LIKE '%log_bin%'; "

# Enable binary log row format (most efficient) mysql -u root -p -e " SET GLOBAL binlog_format = 'ROW'; "

# In my.cnf [mysqld] log_bin = mysql-bin binlog_format = ROW binlog_row_image = MINIMAL # Log only changed columns sync_binlog = 1 # Sync after each write (safe, slower) # Or sync_binlog = 0 for performance (risk of data loss on crash)

# Optimize relay log settings [mysqld] relay_log = relay-bin relay_log_recovery = ON # Auto-recover relay log relay_log_info_repository = TABLE # Faster than file

# Restart MySQL to apply sudo systemctl restart mysql ```

Step 6: Optimize Network Transfer

```bash # Check network between master and replica

# Test network latency ping master-server # High latency causes slow binary log transfer

# Check bandwidth iperf3 -c master-server # Low bandwidth slows log transfer

# Increase master binlog cache mysql -u root -p -e " SET GLOBAL binlog_cache_size = 128M; "

# In my.cnf [mysqld] binlog_cache_size = 128M max_binlog_size = 512M

# Use compression (MySQL 8.0+) mysql -u root -p -e " SET GLOBAL binlog_transaction_compression = ON; SET GLOBAL binlog_transaction_compression_level_zstd = 3; "

# Compressed binary logs transfer faster ```

Step 7: Reduce Replica Read Load

```bash # Check current read load on replica mysql -u root -p -e " SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Questions'; "

# Redirect some reads to another replica # Or use read/write splitting more carefully

# Add more replicas to distribute load # Configure round-robin for reads

# Throttle heavy queries on replica mysql -u root -p -e " SET GLOBAL max_execution_time = 60000; # 60 second limit "

# Schedule heavy queries during low traffic # Move analytics queries to dedicated replica

# Check if replica has enough resources mysql -u root -p -e " SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool%'; " # Buffer pool should be sized adequately ```

Step 8: Split Large Transactions

```bash # Large transactions cause replication lag # SQL thread must process entire transaction before committing

# WRONG: Single massive transaction BEGIN; INSERT INTO logs SELECT * FROM staging_logs; # Millions of rows UPDATE summary SET count = count + millions; # Long operation COMMIT; # Replica blocked until entire transaction applied

# CORRECT: Split into smaller transactions # Process in batches BEGIN; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000; COMMIT;

BEGIN; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000 OFFSET 10000; COMMIT;

# Or use auto-commit for batches INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000; INSERT INTO logs SELECT * FROM staging_logs LIMIT 10000 OFFSET 10000; # Each INSERT is separate transaction

# In application code: for batch in data_batches: cursor.execute("INSERT INTO table VALUES (...)", batch) conn.commit() # Small transactions don't block replication ```

Step 9: Configure Semisynchronous Replication

```bash # Semisync ensures at least one replica receives transaction # Before master commits

# Install plugin (if not installed) mysql -u root -p -e " INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; "

# Enable on master mysql -u root -p -e " SET GLOBAL rpl_semi_sync_master_enabled = 1; SET GLOBAL rpl_semi_sync_master_timeout = 1000; # 1 second "

# Enable on replica mysql -u root -p -e " SET GLOBAL rpl_semi_sync_slave_enabled = 1; STOP SLAVE IO_THREAD; START SLAVE IO_THREAD; "

# In my.cnf [mysqld] # Master rpl_semi_sync_master_enabled = ON rpl_semi_sync_master_timeout = 1000

# Replica rpl_semi_sync_slave_enabled = ON

# Check status mysql -u root -p -e "SHOW STATUS LIKE 'Rpl_semi_sync%';" ```

Step 10: Monitor Replication Continuously

```bash # Create monitoring script cat << 'EOF' > /usr/local/bin/check_replication_lag.sh #!/bin/bash LAG=$(mysql -u root -p$MYSQL_PASS -N -e " SELECT Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS s; ")

if [ "$LAG" -gt 300 ]; then echo "ALERT: Replication lag is $LAG seconds"

# Get detailed status STATUS=$(mysql -u root -p$MYSQL_PASS -e "SHOW SLAVE STATUS\G")

mail -s "MySQL Replication Lag Alert" admin@company.com <<< \ "Replication lag: $LAG seconds

$STATUS" fi EOF

chmod +x /usr/local/bin/check_replication_lag.sh

# Add to cron echo "*/1 * * * * root /usr/local/bin/check_replication_lag.sh" > /etc/cron.d/mysql-replication

# Create Prometheus exporter # Use mysqld_exporter with replication metrics # Grafana dashboard for visualization

# MySQL monitoring query mysql -u root -p -e " SELECT Variable_name, Variable_value FROM performance_schema.global_status WHERE Variable_name LIKE 'Slave%'; " ```

Replication Lag Reduction Checklist

CauseSolution
Slow queriesKill blocking queries
Single-threadedEnable parallel workers
Large transactionsSplit into batches
Network latencyOptimize bandwidth/compression
Heavy read loadAdd replicas, throttle queries
Disk I/OTune binlog sync settings

Verify the Fix

```bash # After applying optimizations

# 1. Check Seconds_Behind_Master decreasing mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep Seconds_Behind_Master # Should be < 10 seconds

# 2. Verify both threads running mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep -E "Slave_IO_Running|Slave_SQL_Running" # Both should be Yes

# 3. Check parallel workers active mysql -u root -p -e "SHOW VARIABLES LIKE 'slave_parallel_workers';"

# 4. Monitor for sustained low lag watch -n 5 'mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep Seconds_Behind_Master'

# 5. Verify data consistency # Compare counts on master and replica mysql -h master -e "SELECT count(*) FROM orders;" mysql -h replica -e "SELECT count(*) FROM orders;" # Should match after lag catches up

# 6. Test application reads on replica # Queries should return current data ```

  • [Fix MySQL Replication Broken](/articles/fix-mysql-replication-broken)
  • [Fix MySQL Binary Log Corrupt](/articles/fix-mysql-binary-log-corrupt)
  • [Fix MySQL Slave SQL Error](/articles/fix-mysql-slave-sql-error)