Fix MySQL Slave SQL Thread Errors - Replication Troubleshooting

# MySQL Slave SQL Thread Error: Replication SQL Errors and Solutions

Your MySQL replication slave shows SQL thread stopped:

sql

SHOW SLAVE STATUS\G

bash

Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Error: Error 'Duplicate entry '1001' for key 'PRIMARY'' on query.
Last_SQL_Error_Number: 1062

The IO thread is receiving events, but the SQL thread fails to execute them. This guide covers diagnosing and fixing SQL thread errors.

Understanding SQL Thread Errors

The SQL thread reads events from relay logs and executes them on the slave. Common failure reasons:

1.Data conflict: Row already exists or doesn't exist
2.Schema mismatch: Table structure differs from master
3.Constraint violation: Foreign key, unique key, or check constraint
4.Permission issue: Slave user lacks privileges
5.Query error: Invalid SQL in replicated event

Step 1: Identify the Error

Check slave status:

sql

SHOW SLAVE STATUS\G

Focus on:

bash

Last_SQL_Error: [Full error message]
Last_SQL_Error_Number: [MySQL error code]
Last_SQL_Error_Timestamp: [When error occurred]
Relay_Master_Log_File: [Master log file]
Exec_Master_Log_Pos: [Position where stopped]

Common error codes:

Code	Error	Description
1062	Duplicate entry	Primary key or unique key conflict
1032	Can't find record	Row missing for UPDATE/DELETE
1213	Deadlock found	Transaction deadlock during execution
1205	Lock wait timeout	Waited too long for lock
1048	Column cannot be null	NULL violation on NOT NULL column
1451	Cannot delete parent row	Foreign key constraint
1452	Cannot add child row	Foreign key constraint
1146	Table doesn't exist	Missing table on slave

Step 2: Examine the Failed Query

The error message shows the failing query. For more detail:

bash

# Find the relay log file and position from SHOW SLAVE STATUS
mysqlbinlog --start-position=Relay_Log_Pos --stop-position=Relay_Log_Pos+1000 \
    /var/lib/mysql/relay-bin.000123

For example, error shows:

bash

Last_SQL_Error: Error 'Duplicate entry '1001' for key 'PRIMARY'' 
on query. Default database: 'mydb'. 
Query: 'INSERT INTO users (id, name) VALUES (1001, 'John')'

Step 3: Fix Duplicate Entry Error (1062)

This happens when a row already exists on slave:

```sql -- Check if row exists SELECT * FROM users WHERE id = 1001;

-- Compare with master data (if accessible) -- On master: SELECT * FROM users WHERE id = 1001;

-- Solution 1: Delete duplicate on slave DELETE FROM users WHERE id = 1001;

-- Restart SQL thread START SLAVE SQL_THREAD; ```

Or skip the transaction:

sql

-- For non-GTID replication
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;

For GTID replication:

```sql -- Get the GTID from error or relay log SHOW SLAVE STATUS\G -- Look at Retrieved_Gtid_Set

-- Inject empty transaction to skip SET GTID_NEXT = 'source_id:transaction_number'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC';

START SLAVE SQL_THREAD; ```

Step 4: Fix Can't Find Record Error (1032)

Row missing on slave for UPDATE or DELETE:

```sql -- Error example: Can't find record in 'orders' -- Query: 'UPDATE orders SET status = 'completed' WHERE id = 5001'

-- Check if row exists SELECT * FROM orders WHERE id = 5001;

-- Solution 1: Insert missing row from master -- Get row data from master or backup INSERT INTO orders (id, customer_id, status, ...) VALUES (5001, ...);

-- Restart SQL thread START SLAVE SQL_THREAD;

-- Solution 2: Skip if data isn't critical STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; ```

Step 5: Fix Foreign Key Errors (1451, 1452)

Constraint violation:

```sql -- Error 1452: Cannot add child row - parent missing -- Query: 'INSERT INTO orders (id, customer_id) VALUES (100, 999)' -- Customer 999 doesn't exist

-- Check parent table SELECT * FROM customers WHERE id = 999;

-- Add missing parent (from master data) INSERT INTO customers (id, name) VALUES (999, 'Company');

-- Restart SQL thread START SLAVE SQL_THREAD; ```

For reverse (deleting parent with children):

```sql -- Error 1451: Cannot delete parent row - children exist -- Query: 'DELETE FROM customers WHERE id = 999'

-- Temporarily disable foreign key checks SET FOREIGN_KEY_CHECKS = 0;

-- Apply the change manually DELETE FROM customers WHERE id = 999;

-- Re-enable SET FOREIGN_KEY_CHECKS = 1;

-- Skip the replicated statement STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; ```

Step 6: Fix Table Doesn't Exist Error (1146)

Missing table on slave:

```sql -- Error: Table 'mydb.orders' doesn't exist

-- Check tables on slave SHOW TABLES FROM mydb;

-- Check on master (if accessible) SHOW TABLES FROM mydb;

-- Create missing table with same structure -- Get CREATE TABLE from master: SHOW CREATE TABLE mydb.orders;

-- Execute on slave: CREATE TABLE orders ( id INT PRIMARY KEY, customer_id INT, ... );

-- Restart SQL thread START SLAVE SQL_THREAD; ```

Step 7: Fix Schema Mismatch

Table exists but structure differs:

```sql -- Compare table structures -- On master: SHOW CREATE TABLE mydb.users;

-- On slave: SHOW CREATE TABLE mydb.users;

-- Find differences (columns, indexes, constraints)

-- Add missing column ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Or recreate table DROP TABLE users; CREATE TABLE users (...); -- Use master's CREATE statement

-- Restart SQL thread START SLAVE SQL_THREAD; ```

Step 8: Configure Automatic Error Skipping

For recurring minor errors:

ini

[mysqld]
slave_skip_errors = 1062,1032

sql

SET GLOBAL slave_skip_errors = '1062,1032';
STOP SLAVE;
START SLAVE;

Warning: This masks real data issues. Use carefully and monitor for data divergence.

Step 9: Use pt-table-sync for Data Drift

Percona Toolkit can fix data differences:

```bash # Check for differences pt-table-checksum --host=master --user=admin --password=xxx \ --databases=mydb --replicate=percona.checksums

# View results on slave SELECT * FROM percona.checksums WHERE this_cnt <> master_cnt;

# Sync slave to master pt-table-sync --execute --sync-to-master \ --host=slave --user=admin --password=xxx mydb.users

# Or sync specific tables pt-table-sync --execute --sync-to-master \ --host=slave --user=admin --password=xxx \ --tables=mydb.users,mydb.orders ```

Step 10: Fix Lock Wait Timeout (1205)

Slave waiting too long for locks:

```sql -- Check for long-running queries blocking the thread SHOW PROCESSLIST;

-- Find blocking transactions SELECT * FROM information_schema.INNODB_LOCK_WAITS;

-- Kill blocking query if safe KILL <blocking_process_id>;

-- Restart SQL thread START SLAVE SQL_THREAD; ```

Reduce lock contention:

ini

[mysqld]
innodb_lock_wait_timeout = 60
slave_parallel_workers = 4  # For parallel replication

Step 11: Fix Deadlock During Replication (1213)

Replication hit a deadlock:

```sql -- Deadlocks auto-resolve, but SQL thread may stop -- Check Last_SQL_Error for "Deadlock found"

-- Usually safe to retry STOP SLAVE; START SLAVE;

-- If persistent, check what's causing deadlock SHOW ENGINE INNODB STATUS\G -- Look at LATEST DETECTED DEADLOCK section ```

Step 12: Handle Large Transaction Failures

If a large transaction causes failure:

```sql -- Check transaction size in relay log mysqlbinlog --base64-output=decode-rows -v \ /var/lib/mysql/relay-bin.000123 | grep -A 1000 "BEGIN"

-- May need to skip entire transaction STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N; -- N = number of events START SLAVE; ```

For GTID, find and skip the transaction:

```sql -- Get failed GTID from relay log or error mysqlbinlog /var/lib/mysql/relay-bin.000123 | grep GTID_NEXT

-- Skip it SET GTID_NEXT = 'uuid:transaction_number'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC'; START SLAVE; ```

Step 13: Rebuild Replication for Severe Issues

If errors persist or data is severely diverged:

```bash # On master, take backup with position mysqldump --master-data=2 --single-transaction --all-databases \ --routines --triggers --events > master_backup.sql

# On slave mysql -e "STOP SLAVE; RESET SLAVE ALL;" mysql < master_backup.sql

# Extract position from backup file head -30 master_backup.sql | grep CHANGE MASTER

# Configure and start replication mysql -e "CHANGE MASTER TO MASTER_HOST='master', MASTER_USER='repl', MASTER_PASSWORD='pass', MASTER_LOG_FILE='mysql-bin.xxx', MASTER_LOG_POS=xxx; START SLAVE;" ```

Verification After Fix

sql

SHOW SLAVE STATUS\G

Success indicators: - Slave_SQL_Running: Yes - Seconds_Behind_Master decreasing - Empty Last_SQL_Error

Test replication:

```sql -- On master INSERT INTO test_table VALUES (999, 'replication test');

-- On slave (after short wait) SELECT * FROM test_table WHERE id = 999;

-- Cleanup DELETE FROM test_table WHERE id = 999; ```

Monitoring SQL Thread

sql

-- Create monitoring query
SELECT 
    CASE 
        WHEN Slave_SQL_Running = 'Yes' AND Seconds_Behind_Master < 60 THEN 'OK'
        WHEN Slave_SQL_Running = 'Yes' AND Seconds_Behind_Master >= 60 THEN 'LAGGING'
        ELSE 'STOPPED'
    END AS status,
    Slave_SQL_Running,
    Seconds_Behind_Master,
    Last_SQL_Error_Number,
    Last_SQL_Error
FROM (SELECT * FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'SLAVE_STATUS') AS s;

```bash #!/bin/bash # check_slave_sql.sh

SQL_RUNNING=$(mysql -N -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running:" | awk '{print $2}') SQL_ERROR=$(mysql -N -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | cut -d: -f2-)

if [ "$SQL_RUNNING" != "Yes" ]; then echo "Slave SQL thread stopped: $SQL_ERROR" | mail -s "Replication Alert" admin@example.com fi ```

Best Practices

```ini [mysqld] # Crash-safe slave relay_log_recovery = 1 relay_log_info_repository = TABLE master_info_repository = TABLE

# For parallel replication (MySQL 5.7+) slave_parallel_type = LOGICAL_CLOCK slave_parallel_workers = 4

# Error handling slave_skip_errors = OFF # Prefer manual handling

# GTID for easier recovery gtid_mode = ON enforce_gtid_consistency = ON ```

Quick Reference Commands

```sql -- Check SQL thread status SHOW SLAVE STATUS\G

-- Start/stop SQL thread only STOP SLAVE SQL_THREAD; START SLAVE SQL_THREAD;

-- Skip one event (non-GTID) STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;

-- Skip GTID transaction SET GTID_NEXT = 'uuid:tx_num'; BEGIN; COMMIT; SET GTID_NEXT = 'AUTOMATIC'; START SLAVE SQL_THREAD;

-- Skip multiple events SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 10;

-- View failed event mysqlbinlog --start-position=XXX relay-bin.000123 | head -50 ```

Troubleshooting Checklist

[ ] Identify error code from SHOW SLAVE STATUS
[ ] Examine the failing query in error message
[ ] Check for data conflict on slave
[ ] Apply manual fix or skip transaction
[ ] Restart SQL thread after fix
[ ] Verify replication continues
[ ] Test with sample data
[ ] Set up monitoring for future errors
[ ] Use pt-table-sync for data drift detection

MySQL Slave SQL Thread Error: Replication SQL Errors and Solutions

Understanding SQL Thread Errors

Step 1: Identify the Error

Step 2: Examine the Failed Query

Step 3: Fix Duplicate Entry Error (1062)

Step 4: Fix Can't Find Record Error (1032)

Step 5: Fix Foreign Key Errors (1451, 1452)

Step 6: Fix Table Doesn't Exist Error (1146)

Step 7: Fix Schema Mismatch

Step 8: Configure Automatic Error Skipping

Step 9: Use pt-table-sync for Data Drift

Step 10: Fix Lock Wait Timeout (1205)

Step 11: Fix Deadlock During Replication (1213)

Step 12: Handle Large Transaction Failures

Step 13: Rebuild Replication for Severe Issues

Verification After Fix

Monitoring SQL Thread

Best Practices

Quick Reference Commands

Troubleshooting Checklist

Share this guide

More Database Troubleshooting Guides

Database Query Timeout

SQL Server TempDB Contention

Database Connection Pool Exhausted

SQL Server AlwaysOn Failover Failed

Database Sharding Rebalance Failed

SQL Server Log Reader Agent Stalled