# MySQL Replication Broken: Slave SQL Error

You check your MySQL replication status and find the slave SQL thread has stopped:

sql
SHOW SLAVE STATUS\G
bash
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Error: Error 'Duplicate entry '1234' for key 'PRIMARY'' on query. Default database: 'mydb'. Query: 'INSERT INTO users (id, name) VALUES (1234, 'John')'
Last_SQL_Error_Number: 1062

Replication has broken, and your slave is no longer receiving updates from the master. Here's how to diagnose and fix replication errors.

Understanding Replication Architecture

  1. 1.MySQL replication uses two threads on the slave:
  2. 2.IO Thread: Connects to master, reads binary log, writes to relay log
  3. 3.SQL Thread: Reads relay log, executes queries on slave

When replication breaks, typically the SQL thread has encountered an error while executing a query.

Step 1: Diagnose the Problem

Check slave status:

sql
SHOW SLAVE STATUS\G

Key fields to examine:

bash
Slave_IO_Running: Yes/No
Slave_SQL_Running: Yes/No
Last_IO_Error: [IO error details]
Last_SQL_Error: [SQL error details]
Last_SQL_Error_Number: [Error code]
Exec_Master_Log_Pos: [Position where SQL thread stopped]
Relay_Master_Log_File: [Master log file being processed]
Retrieved_Gtid_Set: [GTIDs retrieved from master]
Executed_Gtid_Set: [GTIDs executed on slave]
Seconds_Behind_Master: [Replication lag in seconds]

Common error codes:

CodeErrorCause
1062Duplicate entryPrimary key conflict
1032Can't find recordRow doesn't exist for UPDATE/DELETE
1213Deadlock foundTransaction deadlock
1205Lock wait timeoutLock wait exceeded
1236Invalid binlogLog file/position mismatch

Check for specific errors:

```sql -- Check if tables exist on both sides SHOW TABLES FROM mydb;

-- Compare table structures SHOW CREATE TABLE mydb.users\G ```

Step 2: Common Fixes

Fix 1: Skip a Single Transaction

For non-critical errors (like a single duplicate key), skip the problematic transaction:

For GTID replication:

```sql -- Get the failed GTID SHOW SLAVE STATUS\G -- Look for: Retrieved_Gtid_Set

-- Skip the specific transaction SET GTID_NEXT = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1234'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC';

-- Start the SQL thread START SLAVE SQL_THREAD; ```

For binary log position replication:

```sql -- Stop slave STOP SLAVE;

-- Skip one transaction SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

-- Start slave START SLAVE;

-- Check status SHOW SLAVE STATUS\G ```

Fix 2: Skip Specific Errors Automatically

Configure MySQL to skip certain error codes permanently:

ini
[mysqld]
slave_skip_errors = 1062,1032

Or at runtime:

sql
SET GLOBAL slave_skip_errors = '1062,1032';
STOP SLAVE;
START SLAVE;

Warning: This masks real data problems. Use with caution.

Fix 3: Fix Duplicate Entry Error (1062)

When a row already exists on the slave:

```sql -- Identify the conflicting record SELECT * FROM users WHERE id = 1234;

-- Option A: Delete the duplicate on slave DELETE FROM users WHERE id = 1234;

-- Option B: Update if needed UPDATE users SET name = 'John' WHERE id = 1234;

-- Restart SQL thread START SLAVE SQL_THREAD; ```

Fix 4: Fix Can't Find Record Error (1032)

When UPDATE or DELETE can't find the row:

```sql -- Find what's missing -- Check if record exists on master -- On master: SELECT * FROM orders WHERE id = 5678;

-- On slave: SELECT * FROM orders WHERE id = 5678;

-- If missing on slave, insert it INSERT INTO orders (id, customer_id, total) VALUES (5678, 100, 99.99);

-- Or skip if data inconsistency is acceptable STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; ```

Fix 5: Resolve Data Inconsistencies

Use pt-table-checksum and pt-table-sync from Percona Toolkit:

```bash # Check for differences pt-table-checksum --host=master --user=admin --password=xxx \ --databases=mydb --tables=users

# Sync the slave with master pt-table-sync --execute --sync-to-master \ --host=slave --user=admin --password=xxx mydb.users ```

Step 3: Fix IO Thread Issues

If Slave_IO_Running: No:

Check connection to master:

```bash # Test network connectivity ping master-hostname

# Test MySQL connection mysql -h master-hostname -u replication_user -p ```

Check replication user privileges:

sql
-- On master
SHOW GRANTS FOR 'replication_user'@'%';

Ensure user has REPLICATION SLAVE privilege:

sql
GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'%';
FLUSH PRIVILEGES;

Check master binary log position:

sql
-- On master
SHOW MASTER STATUS;

Verify the slave is referencing a valid position:

sql
-- On slave
SHOW SLAVE STATUS\G
-- Compare Master_Log_File and Read_Master_Log_Pos

Restart IO thread:

sql
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;
SHOW SLAVE STATUS\G

Step 4: Rebuild Replication from Scratch

If corruption is extensive, rebuild the slave:

On master, create backup with coordinates:

bash
mysqldump --master-data=2 --single-transaction --all-databases \
  --user=admin --password > master_backup.sql

Or with GTID:

bash
mysqldump --master-data=2 --single-transaction --set-gtid-purged=ON \
  --all-databases --user=admin --password > master_backup.sql

On slave:

sql
STOP SLAVE;
RESET SLAVE ALL;
bash
# Restore backup
mysql --user=admin --password < master_backup.sql

Get coordinates from backup file:

bash
head -50 master_backup.sql | grep "CHANGE MASTER"

Configure replication:

For binary log position:

sql
CHANGE MASTER TO
  MASTER_HOST = 'master-hostname',
  MASTER_USER = 'replication_user',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000123',
  MASTER_LOG_POS = 45678;

For GTID:

sql
CHANGE MASTER TO
  MASTER_HOST = 'master-hostname',
  MASTER_USER = 'replication_user',
  MASTER_PASSWORD = 'password',
  MASTER_AUTO_POSITION = 1;

Start replication:

sql
START SLAVE;
SHOW SLAVE STATUS\G

Step 5: Monitor Replication Health

Set up monitoring queries:

```sql -- Check replication status SELECT CASE WHEN Slave_IO_Running = 'Yes' AND Slave_SQL_Running = 'Yes' THEN 'OK' ELSE 'ERROR' END AS status, Slave_IO_Running, Slave_SQL_Running, Seconds_Behind_Master, Last_Error FROM (SHOW SLAVE STATUS) AS s;

-- Check for lag SELECT CASE WHEN Seconds_Behind_Master > 60 THEN 'HIGH LAG' WHEN Seconds_Behind_Master > 10 THEN 'MODERATE LAG' ELSE 'OK' END AS lag_status, Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS s; ```

Set up alerts:

```bash #!/bin/bash # check_replication.sh

STATUS=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running:" | awk '{print $2}') LAG=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master:" | awk '{print $2}')

if [ "$STATUS" != "Yes" ] || [ "$LAG" -gt 60 ]; then echo "Replication issue: Status=$STATUS, Lag=$LAG seconds" | \ mail -s "MySQL Replication Alert" admin@example.com fi ```

Step 6: Handle Replication Filters

If using replication filters, check them:

sql
SHOW SLAVE STATUS\G
-- Look for Replicate_Do_DB, Replicate_Ignore_DB, etc.

Or:

sql
SHOW VARIABLES LIKE 'replicate%';

Filters can cause errors if queries reference databases not being replicated.

Step 7: Use GTID for Easier Recovery

GTID (Global Transaction Identifier) mode simplifies replication management.

Check if GTID is enabled:

sql
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';

Enable GTID (requires restart):

ini
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON

With GTID, you can easily skip transactions:

```sql -- Find the problematic GTID SHOW SLAVE STATUS\G -- Look for Retrieved_Gtid_Set and Executed_Gtid_Set

-- Skip by injecting an empty transaction SET GTID_NEXT = 'source-id:transaction-number'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC';

START SLAVE SQL_THREAD; ```

Verification

After fixing replication:

sql
SHOW SLAVE STATUS\G

Verify: - Slave_IO_Running: Yes - Slave_SQL_Running: Yes - Seconds_Behind_Master is decreasing toward 0 - Last_IO_Error: and Last_SQL_Error: are empty

Test replication:

```sql -- On master CREATE DATABASE replication_test; CREATE TABLE replication_test.test_table (id INT PRIMARY KEY); INSERT INTO replication_test.test_table VALUES (1);

-- On slave, verify data arrived SELECT * FROM replication_test.test_table;

-- Cleanup DROP DATABASE replication_test; ```

Troubleshooting Checklist

  • [ ] Check both IO and SQL thread status
  • [ ] Identify the specific error code
  • [ ] Examine the failed query
  • [ ] Determine if data can be skipped or needs fix
  • [ ] Apply appropriate fix method
  • [ ] Verify replication resumes
  • [ ] Monitor for ongoing issues
  • [ ] Consider rebuilding if corruption is extensive

Quick Reference Commands

```sql -- Check status SHOW SLAVE STATUS\G

-- Stop/start replication STOP SLAVE; START SLAVE;

-- Stop/start individual threads STOP SLAVE IO_THREAD; STOP SLAVE SQL_THREAD; START SLAVE IO_THREAD; START SLAVE SQL_THREAD;

-- Skip one transaction (non-GTID) STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;

-- Skip transaction (GTID) SET GTID_NEXT = 'gtid-value'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC'; START SLAVE SQL_THREAD;

-- Reset slave STOP SLAVE; RESET SLAVE ALL;

-- Reconfigure CHANGE MASTER TO MASTER_HOST='host', MASTER_USER='user', MASTER_PASSWORD='pass', MASTER_AUTO_POSITION=1; ```

Replication issues require careful diagnosis. Always understand why an error occurred before skipping transactions, as this can lead to data inconsistency between master and slave.