# MySQL Character Set Issues
Symptoms
- Question marks (???) appearing instead of characters
- Garbled or corrupted text in database
- Emojis not displaying correctly
- Accented characters showing as strange symbols
- "Incorrect string value" errors
- Data truncation warnings
Error Messages
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'content'Or:
Warning: #1300 Invalid utf8 character stringOr:
ERROR 1267 (HY000): Illegal mix of collationsRoot Causes
- 1.Wrong character set - Using
utf8instead ofutf8mb4 - 2.Character set mismatch - Database, table, column using different sets
- 3.Connection encoding - Client not setting proper charset
- 4.Collation mismatch - Queries mixing different collations
- 5.Data import issues - Importing with wrong encoding
Diagnosis Steps
Step 1: Check Current Character Sets
```sql -- Database character set SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME = 'your_database';
-- Table character sets SELECT TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'your_database';
-- Column character sets SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'your_database' AND CHARACTER_SET_NAME IS NOT NULL ORDER BY TABLE_NAME, COLUMN_NAME; ```
Step 2: Check Server and Client Settings
-- Check server variables
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';Key variables to check:
- character_set_server - Server default charset
- character_set_database - Current database charset
- character_set_client - Client input charset
- character_set_results - Output charset
- character_set_connection - Connection charset
Step 3: Identify Corrupted Data
```sql -- Find records with encoding issues SELECT id, content, HEX(content) as hex_content FROM posts WHERE content LIKE '%?%' OR content REGEXP '[^\x00-\x7F]';
-- Check for emoji support SELECT id, content FROM posts WHERE LENGTH(content) != CHAR_LENGTH(content); ```
Solutions
Solution 1: Convert to utf8mb4 (Recommended)
utf8mb4 is the proper UTF-8 implementation that supports all characters including emojis.
```sql -- Convert database ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Convert table (including default for new columns) ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Convert specific column ALTER TABLE table_name MODIFY COLUMN content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ```
Solution 2: Convert Multiple Tables at Once
Generate conversion statements:
-- Generate ALTER TABLE statements for all tables
SELECT CONCAT(
'ALTER TABLE ', TABLE_NAME,
' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
) as sql_statement
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_TYPE = 'BASE TABLE';Run the generated statements:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- ... etcSolution 3: Fix Server Configuration
```ini # /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci init_connect = 'SET NAMES utf8mb4' skip-character-set-client-handshake
[client] default-character-set = utf8mb4
[mysql] default-character-set = utf8mb4 ```
Restart MySQL:
sudo systemctl restart mysqlSolution 4: Fix Connection Encoding in Application
PHP PDO:
<?php
$dsn = 'mysql:host=localhost;dbname=database;charset=utf8mb4';
$options = [
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8mb4"
];
$pdo = new PDO($dsn, $username, $password, $options);PHP mysqli:
<?php
$mysqli = new mysqli('localhost', 'user', 'password', 'database');
$mysqli->set_charset('utf8mb4');Node.js:
```javascript const mysql = require('mysql2');
const connection = mysql.createConnection({ host: 'localhost', user: 'user', password: 'password', database: 'database', charset: 'utf8mb4' }); ```
Python:
```python import pymysql
connection = pymysql.connect( host='localhost', user='user', password='password', database='database', charset='utf8mb4mb4' ) ```
Solution 5: Fix Collation Mismatch Errors
```sql -- Check collations involved in error SHOW CREATE TABLE table1; SHOW CREATE TABLE table2;
-- Make collations consistent ALTER TABLE table1 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE table2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Or use COLLATE clause in query SELECT * FROM table1 t1 JOIN table2 t2 ON t1.name COLLATE utf8mb4_unicode_ci = t2.name; ```
Solution 6: Fix Corrupted Data
For data already corrupted with wrong encoding:
```sql -- Check current encoding SELECT content, HEX(content) FROM posts WHERE id = 1;
-- If data was stored as latin1 but should be utf8mb4 -- Convert binary: ALTER TABLE posts MODIFY content BLOB; ALTER TABLE posts MODIFY content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- Or for specific columns UPDATE posts SET content = CONVERT(CAST(CONVERT(content USING latin1) AS BINARY) USING utf8mb4); ```
Solution 7: Handle Import/Export with Correct Encoding
Export with correct encoding:
# Export with utf8mb4
mysqldump -u user -p --default-character-set=utf8mb4 database > backup.sqlImport with correct encoding:
# Import with utf8mb4
mysql -u user -p --default-character-set=utf8mb4 database < backup.sqlDuring import, verify SQL file encoding:
```bash # Check file encoding file -i backup.sql
# Convert if needed iconv -f ISO-8859-1 -t UTF-8 input.sql > output.sql ```
Verification
Verify Character Set Configuration
```sql -- Should all show utf8mb4 SHOW VARIABLES LIKE 'character_set%';
-- Test emoji support INSERT INTO test_table (content) VALUES ('Hello 😊 World 🌍');
-- Verify storage SELECT content, HEX(content) FROM test_table; ```
Verify All Tables Are utf8mb4
SELECT
TABLE_NAME,
TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_COLLATION NOT LIKE 'utf8mb4%';Prevention
1. Set Default Character Set
Always specify character set when creating databases and tables:
```sql CREATE DATABASE myapp CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE TABLE posts ( id INT AUTO_INCREMENT PRIMARY KEY, content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, created_at DATETIME ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ```
2. Configure Application Properly
Always set connection encoding at application startup:
```php // PHP - Set in connection $pdo->exec("SET NAMES utf8mb4");
// Or in PDO options $options[PDO::MYSQL_ATTR_INIT_COMMAND] = "SET NAMES utf8mb4"; ```
3. Document Character Set Requirements
Include character set requirements in: - Database schema documentation - API documentation - Developer onboarding materials
4. Regular Audits
-- Scheduled check for non-utf8mb4 tables
SELECT TABLE_NAME, TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_COLLATION NOT LIKE 'utf8mb4%';utf8 vs utf8mb4
Important: MySQL's utf8 is NOT true UTF-8:
- utf8 - Supports only 3-byte characters (BMP only)
- utf8mb4 - Supports 4-byte characters (full UTF-8 including emojis)
Always use utf8mb4 for proper Unicode support.
Related Errors
- [MySQL Connection Refused](./fix-mysql-connection-refused)
- [MySQL Foreign Key Constraint Fails](./fix-mysql-foreign-key-constraint)