# MySQL Replication Lag
Symptoms
Seconds_Behind_Mastergrowing continuously- Read replicas returning stale data
- Application reading inconsistent data
- Replication alerts triggering frequently
- Read queries hitting old data
Root Causes
- 1.Long-running transactions on master - Large writes blocking replication
- 2.Network latency - Slow network between master and replica
- 3.Insufficient replica resources - CPU, memory, or disk I/O bottlenecks
- 4.Single-threaded replication - MySQL 5.6 and earlier limitation
- 5.Large binary logs - Slow log transfer and application
- 6.Hot tables/rows - Contention on specific data
Diagnosis Steps
Step 1: Check Replication Status
```sql -- On replica SHOW SLAVE STATUS\G
-- Key fields to monitor: -- Seconds_Behind_Master: Lag in seconds (0 = in sync) -- Slave_IO_Running: Should be Yes -- Slave_SQL_Running: Should be Yes -- Last_Error: Any recent errors -- Relay_Master_Log_File: Current log file -- Exec_Master_Log_Pos: Current position ```
Step 2: Identify Replication Delay Source
-- Check if IO thread is caught up
SELECT
MASTER_LOG_FILE as master_log,
RELAY_MASTER_LOG_FILE as replica_received_log,
MASTER_LOG_POS - EXEC_MASTER_LOG_POS as bytes_behind
FROM (SHOW SLAVE STATUS);If bytes_behind is large:
- Network issue - IO thread not receiving data fast enough
- Disk issue - Replica can't write relay logs fast enough
If bytes_behind is small but Seconds_Behind_Master is high:
- SQL thread issue - Replica can't apply changes fast enough
Step 3: Check for Long-Running Transactions
-- On master, find long transactions
SELECT
trx_id,
trx_started,
TIMESTAMPDIFF(SECOND, trx_started, NOW()) as duration_seconds,
trx_rows_modified,
trx_mysql_thread_id
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC;Step 4: Monitor System Resources
```bash # Check CPU usage top -bn1 | head -20
# Check disk I/O iostat -x 5 5
# Check MySQL process list on replica mysql -e "SHOW PROCESSLIST" | grep -E "system user|executing"
# Check network throughput iftop -i eth0 ```
Solutions
Solution 1: Enable Multi-Threaded Replication
For MySQL 5.7+:
```sql -- Stop replication STOP SLAVE;
-- Enable parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 4; -- Adjust based on CPU cores
-- Start replication START SLAVE; ```
For MySQL 8.0+:
```sql STOP SLAVE;
-- Enhanced parallel replication SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 8; SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';
START SLAVE; ```
Solution 2: Optimize Long Transactions
```sql -- Break up large transactions -- Instead of: DELETE FROM logs WHERE created_at < '2023-01-01'; -- Millions of rows
-- Do: DELETE FROM logs WHERE created_at < '2023-01-01' LIMIT 10000; -- Repeat in batches with pauses
-- Or use pt-archiver for safe large deletions pt-archiver --source h=localhost,D=app,t=logs \ --where "created_at < '2023-01-01'" \ --purge --limit 1000 --commit-each ```
Solution 3: Tune Replication Parameters
```ini # /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] # On master binlog_format = ROW binlog_row_image = MINIMAL sync_binlog = 1 innodb_flush_log_at_trx_commit = 1
# On replica relay_log_recovery = ON relay_log_info_repository = TABLE sync_relay_log = 0 sync_relay_log_info = 0 slave_net_timeout = 60 ```
Solution 4: Optimize Network and Disk
```ini # Increase binary log size (fewer, larger files) [mysqld] max_binlog_size = 500M binlog_cache_size = 128K
# Use SSD for binary logs and relay logs # Separate disk for binary logs: log_bin = /ssd/mysql-bin/mysql-bin relay_log = /ssd/relay-log/relay-log ```
Solution 5: Use GTID for Easier Recovery
# Enable GTID (both master and replica)
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ONAfter enabling, replication setup:
```sql -- On replica CHANGE MASTER TO MASTER_HOST = 'master_ip', MASTER_USER = 'replica_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;
START SLAVE; ```
Solution 6: Monitor and Alert
```sql -- Create monitoring table CREATE TABLE replication_monitor ( id INT AUTO_INCREMENT PRIMARY KEY, check_time DATETIME, seconds_behind INT, io_running VARCHAR(3), sql_running VARCHAR(3), INDEX (check_time) );
-- Stored procedure for monitoring DELIMITER // CREATE PROCEDURE check_replication() BEGIN INSERT INTO replication_monitor SELECT NULL, NOW(), Seconds_Behind_Master, Slave_IO_Running, Slave_SQL_Running FROM (SHOW SLAVE STATUS) AS status; END// DELIMITER ; ```
Monitoring script:
```bash #!/bin/bash # Check replication lag
SECONDS_BEHIND=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
if [ "$SECONDS_BEHIND" -gt 60 ]; then echo "CRITICAL: Replication lag is ${SECONDS_BEHIND} seconds" # Send alert exit 2 elif [ "$SECONDS_BEHIND" -gt 30 ]; then echo "WARNING: Replication lag is ${SECONDS_BEHIND} seconds" exit 1 else echo "OK: Replication lag is ${SECONDS_BEHIND} seconds" exit 0 fi ```
Solution 7: Handle Large Data Loads
```sql -- Temporarily disable binary logging for bulk operations (master) SET SESSION sql_log_bin = 0;
-- Perform bulk operation LOAD DATA INFILE '/path/to/large_file.csv' INTO TABLE large_table;
-- Re-enable binary logging SET SESSION sql_log_bin = 1;
-- Alternatively, use mysqldump with --skip-lock-tables -- and import directly on replica ```
Recovery from Replication Break
Skip Single Replication Error
-- Only use when you understand the consequences
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;Rebuild Replica from Backup
```bash # On master mysqldump --single-transaction --master-data=2 --flush-logs --all-databases > backup.sql
# Transfer to replica scp backup.sql replica:/backup/
# On replica mysql < backup.sql
# Get master position from backup.sql head head -50 backup.sql | grep "CHANGE MASTER TO"
# Set position and start CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000123', MASTER_LOG_POS = 456789; START SLAVE; ```
Prevention
1. Regular Monitoring
- Set up alerts for
Seconds_Behind_Master > 30 - Monitor disk I/O on both master and replica
- Track replication queue depth
2. Capacity Planning
- Ensure replica has equal or greater resources than master
- Use SSD for binary logs and relay logs
- Plan for network capacity growth
3. Configuration Best Practices
- Use ROW-based replication for better parallelization
- Enable multi-threaded replication
- Use GTID for easier recovery
- Configure appropriate timeouts
Related Errors
- [MySQL Connection Refused](./fix-mysql-connection-refused)
- [MySQL Slow Query Performance](./fix-mysql-slow-query)