MySQL Replication Lag

# MySQL Replication Lag

Symptoms

Seconds_Behind_Master growing continuously
Read replicas returning stale data
Application reading inconsistent data
Replication alerts triggering frequently
Read queries hitting old data

Root Causes

1.Long-running transactions on master - Large writes blocking replication
2.Network latency - Slow network between master and replica
3.Insufficient replica resources - CPU, memory, or disk I/O bottlenecks
4.Single-threaded replication - MySQL 5.6 and earlier limitation
5.Large binary logs - Slow log transfer and application
6.Hot tables/rows - Contention on specific data

Diagnosis Steps

Step 1: Check Replication Status

```sql -- On replica SHOW SLAVE STATUS\G

-- Key fields to monitor: -- Seconds_Behind_Master: Lag in seconds (0 = in sync) -- Slave_IO_Running: Should be Yes -- Slave_SQL_Running: Should be Yes -- Last_Error: Any recent errors -- Relay_Master_Log_File: Current log file -- Exec_Master_Log_Pos: Current position ```

Step 2: Identify Replication Delay Source

sql

-- Check if IO thread is caught up
SELECT 
    MASTER_LOG_FILE as master_log,
    RELAY_MASTER_LOG_FILE as replica_received_log,
    MASTER_LOG_POS - EXEC_MASTER_LOG_POS as bytes_behind
FROM (SHOW SLAVE STATUS);

If bytes_behind is large: - Network issue - IO thread not receiving data fast enough - Disk issue - Replica can't write relay logs fast enough

If bytes_behind is small but Seconds_Behind_Master is high: - SQL thread issue - Replica can't apply changes fast enough

Step 3: Check for Long-Running Transactions

sql

-- On master, find long transactions
SELECT 
    trx_id,
    trx_started,
    TIMESTAMPDIFF(SECOND, trx_started, NOW()) as duration_seconds,
    trx_rows_modified,
    trx_mysql_thread_id
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC;

Step 4: Monitor System Resources

```bash # Check CPU usage top -bn1 | head -20

# Check disk I/O iostat -x 5 5

# Check MySQL process list on replica mysql -e "SHOW PROCESSLIST" | grep -E "system user|executing"

# Check network throughput iftop -i eth0 ```

Solutions

Solution 1: Enable Multi-Threaded Replication

For MySQL 5.7+:

```sql -- Stop replication STOP SLAVE;

-- Enable parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 4; -- Adjust based on CPU cores

-- Start replication START SLAVE; ```

For MySQL 8.0+:

```sql STOP SLAVE;

-- Enhanced parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 8; SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';

START SLAVE; ```

Solution 2: Optimize Long Transactions

```sql -- Break up large transactions -- Instead of: DELETE FROM logs WHERE created_at < '2023-01-01'; -- Millions of rows

-- Do: DELETE FROM logs WHERE created_at < '2023-01-01' LIMIT 10000; -- Repeat in batches with pauses

-- Or use pt-archiver for safe large deletions pt-archiver --source h=localhost,D=app,t=logs \ --where "created_at < '2023-01-01'" \ --purge --limit 1000 --commit-each ```

Solution 3: Tune Replication Parameters

```ini # /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] # On master binlog_format = ROW binlog_row_image = MINIMAL sync_binlog = 1 innodb_flush_log_at_trx_commit = 1

# On replica relay_log_recovery = ON relay_log_info_repository = TABLE sync_relay_log = 0 sync_relay_log_info = 0 slave_net_timeout = 60 ```

Solution 4: Optimize Network and Disk

```ini # Increase binary log size (fewer, larger files) [mysqld] max_binlog_size = 500M binlog_cache_size = 128K

# Use SSD for binary logs and relay logs # Separate disk for binary logs: log_bin = /ssd/mysql-bin/mysql-bin relay_log = /ssd/relay-log/relay-log ```

Solution 5: Use GTID for Easier Recovery

ini

# Enable GTID (both master and replica)
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON

After enabling, replication setup:

```sql -- On replica CHANGE MASTER TO MASTER_HOST = 'master_ip', MASTER_USER = 'replica_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;

START SLAVE; ```

Solution 6: Monitor and Alert

```sql -- Create monitoring table CREATE TABLE replication_monitor ( id INT AUTO_INCREMENT PRIMARY KEY, check_time DATETIME, seconds_behind INT, io_running VARCHAR(3), sql_running VARCHAR(3), INDEX (check_time) );

-- Stored procedure for monitoring DELIMITER // CREATE PROCEDURE check_replication() BEGIN INSERT INTO replication_monitor SELECT NULL, NOW(), Seconds_Behind_Master, Slave_IO_Running, Slave_SQL_Running FROM (SHOW SLAVE STATUS) AS status; END// DELIMITER ; ```

Monitoring script:

```bash #!/bin/bash # Check replication lag

SECONDS_BEHIND=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')

if [ "$SECONDS_BEHIND" -gt 60 ]; then echo "CRITICAL: Replication lag is ${SECONDS_BEHIND} seconds" # Send alert exit 2 elif [ "$SECONDS_BEHIND" -gt 30 ]; then echo "WARNING: Replication lag is ${SECONDS_BEHIND} seconds" exit 1 else echo "OK: Replication lag is ${SECONDS_BEHIND} seconds" exit 0 fi ```

Solution 7: Handle Large Data Loads

```sql -- Temporarily disable binary logging for bulk operations (master) SET SESSION sql_log_bin = 0;

-- Perform bulk operation LOAD DATA INFILE '/path/to/large_file.csv' INTO TABLE large_table;

-- Re-enable binary logging SET SESSION sql_log_bin = 1;

-- Alternatively, use mysqldump with --skip-lock-tables -- and import directly on replica ```

Recovery from Replication Break

Skip Single Replication Error

sql

-- Only use when you understand the consequences
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

Rebuild Replica from Backup

```bash # On master mysqldump --single-transaction --master-data=2 --flush-logs --all-databases > backup.sql

# Transfer to replica scp backup.sql replica:/backup/

# On replica mysql < backup.sql

# Get master position from backup.sql head head -50 backup.sql | grep "CHANGE MASTER TO"

# Set position and start CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000123', MASTER_LOG_POS = 456789; START SLAVE; ```

Prevention

1. Regular Monitoring

Set up alerts for Seconds_Behind_Master > 30
Monitor disk I/O on both master and replica
Track replication queue depth

2. Capacity Planning

Ensure replica has equal or greater resources than master
Use SSD for binary logs and relay logs
Plan for network capacity growth

3. Configuration Best Practices

Use ROW-based replication for better parallelization
Enable multi-threaded replication
Use GTID for easier recovery
Configure appropriate timeouts

[MySQL Connection Refused](./fix-mysql-connection-refused)
[MySQL Slow Query Performance](./fix-mysql-slow-query)