# MySQL Replication Lag

Symptoms

  • Seconds_Behind_Master growing continuously
  • Read replicas returning stale data
  • Application reading inconsistent data
  • Replication alerts triggering frequently
  • Read queries hitting old data

Root Causes

  1. 1.Long-running transactions on master - Large writes blocking replication
  2. 2.Network latency - Slow network between master and replica
  3. 3.Insufficient replica resources - CPU, memory, or disk I/O bottlenecks
  4. 4.Single-threaded replication - MySQL 5.6 and earlier limitation
  5. 5.Large binary logs - Slow log transfer and application
  6. 6.Hot tables/rows - Contention on specific data

Diagnosis Steps

Step 1: Check Replication Status

```sql -- On replica SHOW SLAVE STATUS\G

-- Key fields to monitor: -- Seconds_Behind_Master: Lag in seconds (0 = in sync) -- Slave_IO_Running: Should be Yes -- Slave_SQL_Running: Should be Yes -- Last_Error: Any recent errors -- Relay_Master_Log_File: Current log file -- Exec_Master_Log_Pos: Current position ```

Step 2: Identify Replication Delay Source

sql
-- Check if IO thread is caught up
SELECT 
    MASTER_LOG_FILE as master_log,
    RELAY_MASTER_LOG_FILE as replica_received_log,
    MASTER_LOG_POS - EXEC_MASTER_LOG_POS as bytes_behind
FROM (SHOW SLAVE STATUS);

If bytes_behind is large: - Network issue - IO thread not receiving data fast enough - Disk issue - Replica can't write relay logs fast enough

If bytes_behind is small but Seconds_Behind_Master is high: - SQL thread issue - Replica can't apply changes fast enough

Step 3: Check for Long-Running Transactions

sql
-- On master, find long transactions
SELECT 
    trx_id,
    trx_started,
    TIMESTAMPDIFF(SECOND, trx_started, NOW()) as duration_seconds,
    trx_rows_modified,
    trx_mysql_thread_id
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC;

Step 4: Monitor System Resources

```bash # Check CPU usage top -bn1 | head -20

# Check disk I/O iostat -x 5 5

# Check MySQL process list on replica mysql -e "SHOW PROCESSLIST" | grep -E "system user|executing"

# Check network throughput iftop -i eth0 ```

Solutions

Solution 1: Enable Multi-Threaded Replication

For MySQL 5.7+:

```sql -- Stop replication STOP SLAVE;

-- Enable parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 4; -- Adjust based on CPU cores

-- Start replication START SLAVE; ```

For MySQL 8.0+:

```sql STOP SLAVE;

-- Enhanced parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 8; SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';

START SLAVE; ```

Solution 2: Optimize Long Transactions

```sql -- Break up large transactions -- Instead of: DELETE FROM logs WHERE created_at < '2023-01-01'; -- Millions of rows

-- Do: DELETE FROM logs WHERE created_at < '2023-01-01' LIMIT 10000; -- Repeat in batches with pauses

-- Or use pt-archiver for safe large deletions pt-archiver --source h=localhost,D=app,t=logs \ --where "created_at < '2023-01-01'" \ --purge --limit 1000 --commit-each ```

Solution 3: Tune Replication Parameters

```ini # /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] # On master binlog_format = ROW binlog_row_image = MINIMAL sync_binlog = 1 innodb_flush_log_at_trx_commit = 1

# On replica relay_log_recovery = ON relay_log_info_repository = TABLE sync_relay_log = 0 sync_relay_log_info = 0 slave_net_timeout = 60 ```

Solution 4: Optimize Network and Disk

```ini # Increase binary log size (fewer, larger files) [mysqld] max_binlog_size = 500M binlog_cache_size = 128K

# Use SSD for binary logs and relay logs # Separate disk for binary logs: log_bin = /ssd/mysql-bin/mysql-bin relay_log = /ssd/relay-log/relay-log ```

Solution 5: Use GTID for Easier Recovery

ini
# Enable GTID (both master and replica)
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON

After enabling, replication setup:

```sql -- On replica CHANGE MASTER TO MASTER_HOST = 'master_ip', MASTER_USER = 'replica_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;

START SLAVE; ```

Solution 6: Monitor and Alert

```sql -- Create monitoring table CREATE TABLE replication_monitor ( id INT AUTO_INCREMENT PRIMARY KEY, check_time DATETIME, seconds_behind INT, io_running VARCHAR(3), sql_running VARCHAR(3), INDEX (check_time) );

-- Stored procedure for monitoring DELIMITER // CREATE PROCEDURE check_replication() BEGIN INSERT INTO replication_monitor SELECT NULL, NOW(), Seconds_Behind_Master, Slave_IO_Running, Slave_SQL_Running FROM (SHOW SLAVE STATUS) AS status; END// DELIMITER ; ```

Monitoring script:

```bash #!/bin/bash # Check replication lag

SECONDS_BEHIND=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')

if [ "$SECONDS_BEHIND" -gt 60 ]; then echo "CRITICAL: Replication lag is ${SECONDS_BEHIND} seconds" # Send alert exit 2 elif [ "$SECONDS_BEHIND" -gt 30 ]; then echo "WARNING: Replication lag is ${SECONDS_BEHIND} seconds" exit 1 else echo "OK: Replication lag is ${SECONDS_BEHIND} seconds" exit 0 fi ```

Solution 7: Handle Large Data Loads

```sql -- Temporarily disable binary logging for bulk operations (master) SET SESSION sql_log_bin = 0;

-- Perform bulk operation LOAD DATA INFILE '/path/to/large_file.csv' INTO TABLE large_table;

-- Re-enable binary logging SET SESSION sql_log_bin = 1;

-- Alternatively, use mysqldump with --skip-lock-tables -- and import directly on replica ```

Recovery from Replication Break

Skip Single Replication Error

sql
-- Only use when you understand the consequences
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

Rebuild Replica from Backup

```bash # On master mysqldump --single-transaction --master-data=2 --flush-logs --all-databases > backup.sql

# Transfer to replica scp backup.sql replica:/backup/

# On replica mysql < backup.sql

# Get master position from backup.sql head head -50 backup.sql | grep "CHANGE MASTER TO"

# Set position and start CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000123', MASTER_LOG_POS = 456789; START SLAVE; ```

Prevention

1. Regular Monitoring

  • Set up alerts for Seconds_Behind_Master > 30
  • Monitor disk I/O on both master and replica
  • Track replication queue depth

2. Capacity Planning

  • Ensure replica has equal or greater resources than master
  • Use SSD for binary logs and relay logs
  • Plan for network capacity growth

3. Configuration Best Practices

  • Use ROW-based replication for better parallelization
  • Enable multi-threaded replication
  • Use GTID for easier recovery
  • Configure appropriate timeouts
  • [MySQL Connection Refused](./fix-mysql-connection-refused)
  • [MySQL Slow Query Performance](./fix-mysql-slow-query)