# MySQL Replication Broken: Slave SQL Error
You check your MySQL replication status and find the slave SQL thread has stopped:
SHOW SLAVE STATUS\GSlave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Error: Error 'Duplicate entry '1234' for key 'PRIMARY'' on query. Default database: 'mydb'. Query: 'INSERT INTO users (id, name) VALUES (1234, 'John')'
Last_SQL_Error_Number: 1062Replication has broken, and your slave is no longer receiving updates from the master. Here's how to diagnose and fix replication errors.
Understanding Replication Architecture
- 1.MySQL replication uses two threads on the slave:
- 2.IO Thread: Connects to master, reads binary log, writes to relay log
- 3.SQL Thread: Reads relay log, executes queries on slave
When replication breaks, typically the SQL thread has encountered an error while executing a query.
Step 1: Diagnose the Problem
Check slave status:
SHOW SLAVE STATUS\GKey fields to examine:
Slave_IO_Running: Yes/No
Slave_SQL_Running: Yes/No
Last_IO_Error: [IO error details]
Last_SQL_Error: [SQL error details]
Last_SQL_Error_Number: [Error code]
Exec_Master_Log_Pos: [Position where SQL thread stopped]
Relay_Master_Log_File: [Master log file being processed]
Retrieved_Gtid_Set: [GTIDs retrieved from master]
Executed_Gtid_Set: [GTIDs executed on slave]
Seconds_Behind_Master: [Replication lag in seconds]Common error codes:
| Code | Error | Cause |
|---|---|---|
| 1062 | Duplicate entry | Primary key conflict |
| 1032 | Can't find record | Row doesn't exist for UPDATE/DELETE |
| 1213 | Deadlock found | Transaction deadlock |
| 1205 | Lock wait timeout | Lock wait exceeded |
| 1236 | Invalid binlog | Log file/position mismatch |
Check for specific errors:
```sql -- Check if tables exist on both sides SHOW TABLES FROM mydb;
-- Compare table structures SHOW CREATE TABLE mydb.users\G ```
Step 2: Common Fixes
Fix 1: Skip a Single Transaction
For non-critical errors (like a single duplicate key), skip the problematic transaction:
For GTID replication:
```sql -- Get the failed GTID SHOW SLAVE STATUS\G -- Look for: Retrieved_Gtid_Set
-- Skip the specific transaction SET GTID_NEXT = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1234'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC';
-- Start the SQL thread START SLAVE SQL_THREAD; ```
For binary log position replication:
```sql -- Stop slave STOP SLAVE;
-- Skip one transaction SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
-- Start slave START SLAVE;
-- Check status SHOW SLAVE STATUS\G ```
Fix 2: Skip Specific Errors Automatically
Configure MySQL to skip certain error codes permanently:
[mysqld]
slave_skip_errors = 1062,1032Or at runtime:
SET GLOBAL slave_skip_errors = '1062,1032';
STOP SLAVE;
START SLAVE;Warning: This masks real data problems. Use with caution.
Fix 3: Fix Duplicate Entry Error (1062)
When a row already exists on the slave:
```sql -- Identify the conflicting record SELECT * FROM users WHERE id = 1234;
-- Option A: Delete the duplicate on slave DELETE FROM users WHERE id = 1234;
-- Option B: Update if needed UPDATE users SET name = 'John' WHERE id = 1234;
-- Restart SQL thread START SLAVE SQL_THREAD; ```
Fix 4: Fix Can't Find Record Error (1032)
When UPDATE or DELETE can't find the row:
```sql -- Find what's missing -- Check if record exists on master -- On master: SELECT * FROM orders WHERE id = 5678;
-- On slave: SELECT * FROM orders WHERE id = 5678;
-- If missing on slave, insert it INSERT INTO orders (id, customer_id, total) VALUES (5678, 100, 99.99);
-- Or skip if data inconsistency is acceptable STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; ```
Fix 5: Resolve Data Inconsistencies
Use pt-table-checksum and pt-table-sync from Percona Toolkit:
```bash # Check for differences pt-table-checksum --host=master --user=admin --password=xxx \ --databases=mydb --tables=users
# Sync the slave with master pt-table-sync --execute --sync-to-master \ --host=slave --user=admin --password=xxx mydb.users ```
Step 3: Fix IO Thread Issues
If Slave_IO_Running: No:
Check connection to master:
```bash # Test network connectivity ping master-hostname
# Test MySQL connection mysql -h master-hostname -u replication_user -p ```
Check replication user privileges:
-- On master
SHOW GRANTS FOR 'replication_user'@'%';Ensure user has REPLICATION SLAVE privilege:
GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'%';
FLUSH PRIVILEGES;Check master binary log position:
-- On master
SHOW MASTER STATUS;Verify the slave is referencing a valid position:
-- On slave
SHOW SLAVE STATUS\G
-- Compare Master_Log_File and Read_Master_Log_PosRestart IO thread:
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;
SHOW SLAVE STATUS\GStep 4: Rebuild Replication from Scratch
If corruption is extensive, rebuild the slave:
On master, create backup with coordinates:
mysqldump --master-data=2 --single-transaction --all-databases \
--user=admin --password > master_backup.sqlOr with GTID:
mysqldump --master-data=2 --single-transaction --set-gtid-purged=ON \
--all-databases --user=admin --password > master_backup.sqlOn slave:
STOP SLAVE;
RESET SLAVE ALL;# Restore backup
mysql --user=admin --password < master_backup.sqlGet coordinates from backup file:
head -50 master_backup.sql | grep "CHANGE MASTER"Configure replication:
For binary log position:
CHANGE MASTER TO
MASTER_HOST = 'master-hostname',
MASTER_USER = 'replication_user',
MASTER_PASSWORD = 'password',
MASTER_LOG_FILE = 'mysql-bin.000123',
MASTER_LOG_POS = 45678;For GTID:
CHANGE MASTER TO
MASTER_HOST = 'master-hostname',
MASTER_USER = 'replication_user',
MASTER_PASSWORD = 'password',
MASTER_AUTO_POSITION = 1;Start replication:
START SLAVE;
SHOW SLAVE STATUS\GStep 5: Monitor Replication Health
Set up monitoring queries:
```sql -- Check replication status SELECT CASE WHEN Slave_IO_Running = 'Yes' AND Slave_SQL_Running = 'Yes' THEN 'OK' ELSE 'ERROR' END AS status, Slave_IO_Running, Slave_SQL_Running, Seconds_Behind_Master, Last_Error FROM (SHOW SLAVE STATUS) AS s;
-- Check for lag SELECT CASE WHEN Seconds_Behind_Master > 60 THEN 'HIGH LAG' WHEN Seconds_Behind_Master > 10 THEN 'MODERATE LAG' ELSE 'OK' END AS lag_status, Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS s; ```
Set up alerts:
```bash #!/bin/bash # check_replication.sh
STATUS=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running:" | awk '{print $2}') LAG=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master:" | awk '{print $2}')
if [ "$STATUS" != "Yes" ] || [ "$LAG" -gt 60 ]; then echo "Replication issue: Status=$STATUS, Lag=$LAG seconds" | \ mail -s "MySQL Replication Alert" admin@example.com fi ```
Step 6: Handle Replication Filters
If using replication filters, check them:
SHOW SLAVE STATUS\G
-- Look for Replicate_Do_DB, Replicate_Ignore_DB, etc.Or:
SHOW VARIABLES LIKE 'replicate%';Filters can cause errors if queries reference databases not being replicated.
Step 7: Use GTID for Easier Recovery
GTID (Global Transaction Identifier) mode simplifies replication management.
Check if GTID is enabled:
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';Enable GTID (requires restart):
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ONWith GTID, you can easily skip transactions:
```sql -- Find the problematic GTID SHOW SLAVE STATUS\G -- Look for Retrieved_Gtid_Set and Executed_Gtid_Set
-- Skip by injecting an empty transaction SET GTID_NEXT = 'source-id:transaction-number'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC';
START SLAVE SQL_THREAD; ```
Verification
After fixing replication:
SHOW SLAVE STATUS\GVerify:
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- Seconds_Behind_Master is decreasing toward 0
- Last_IO_Error: and Last_SQL_Error: are empty
Test replication:
```sql -- On master CREATE DATABASE replication_test; CREATE TABLE replication_test.test_table (id INT PRIMARY KEY); INSERT INTO replication_test.test_table VALUES (1);
-- On slave, verify data arrived SELECT * FROM replication_test.test_table;
-- Cleanup DROP DATABASE replication_test; ```
Troubleshooting Checklist
- [ ] Check both IO and SQL thread status
- [ ] Identify the specific error code
- [ ] Examine the failed query
- [ ] Determine if data can be skipped or needs fix
- [ ] Apply appropriate fix method
- [ ] Verify replication resumes
- [ ] Monitor for ongoing issues
- [ ] Consider rebuilding if corruption is extensive
Quick Reference Commands
```sql -- Check status SHOW SLAVE STATUS\G
-- Stop/start replication STOP SLAVE; START SLAVE;
-- Stop/start individual threads STOP SLAVE IO_THREAD; STOP SLAVE SQL_THREAD; START SLAVE IO_THREAD; START SLAVE SQL_THREAD;
-- Skip one transaction (non-GTID) STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
-- Skip transaction (GTID) SET GTID_NEXT = 'gtid-value'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC'; START SLAVE SQL_THREAD;
-- Reset slave STOP SLAVE; RESET SLAVE ALL;
-- Reconfigure CHANGE MASTER TO MASTER_HOST='host', MASTER_USER='user', MASTER_PASSWORD='pass', MASTER_AUTO_POSITION=1; ```
Replication issues require careful diagnosis. Always understand why an error occurred before skipping transactions, as this can lead to data inconsistency between master and slave.