# MySQL Character Set Issues

Symptoms

  • Question marks (???) appearing instead of characters
  • Garbled or corrupted text in database
  • Emojis not displaying correctly
  • Accented characters showing as strange symbols
  • "Incorrect string value" errors
  • Data truncation warnings

Error Messages

bash
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'content'

Or:

bash
Warning: #1300 Invalid utf8 character string

Or:

bash
ERROR 1267 (HY000): Illegal mix of collations

Root Causes

  1. 1.Wrong character set - Using utf8 instead of utf8mb4
  2. 2.Character set mismatch - Database, table, column using different sets
  3. 3.Connection encoding - Client not setting proper charset
  4. 4.Collation mismatch - Queries mixing different collations
  5. 5.Data import issues - Importing with wrong encoding

Diagnosis Steps

Step 1: Check Current Character Sets

```sql -- Database character set SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE SCHEMA_NAME = 'your_database';

-- Table character sets SELECT TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'your_database';

-- Column character sets SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'your_database' AND CHARACTER_SET_NAME IS NOT NULL ORDER BY TABLE_NAME, COLUMN_NAME; ```

Step 2: Check Server and Client Settings

sql
-- Check server variables
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

Key variables to check: - character_set_server - Server default charset - character_set_database - Current database charset - character_set_client - Client input charset - character_set_results - Output charset - character_set_connection - Connection charset

Step 3: Identify Corrupted Data

```sql -- Find records with encoding issues SELECT id, content, HEX(content) as hex_content FROM posts WHERE content LIKE '%?%' OR content REGEXP '[^\x00-\x7F]';

-- Check for emoji support SELECT id, content FROM posts WHERE LENGTH(content) != CHAR_LENGTH(content); ```

Solutions

Solution 1: Convert to utf8mb4 (Recommended)

utf8mb4 is the proper UTF-8 implementation that supports all characters including emojis.

```sql -- Convert database ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Convert table (including default for new columns) ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Convert specific column ALTER TABLE table_name MODIFY COLUMN content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ```

Solution 2: Convert Multiple Tables at Once

Generate conversion statements:

sql
-- Generate ALTER TABLE statements for all tables
SELECT CONCAT(
    'ALTER TABLE ', TABLE_NAME, 
    ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
) as sql_statement
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_TYPE = 'BASE TABLE';

Run the generated statements:

sql
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- ... etc

Solution 3: Fix Server Configuration

```ini # /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci init_connect = 'SET NAMES utf8mb4' skip-character-set-client-handshake

[client] default-character-set = utf8mb4

[mysql] default-character-set = utf8mb4 ```

Restart MySQL:

bash
sudo systemctl restart mysql

Solution 4: Fix Connection Encoding in Application

PHP PDO:

php
<?php
$dsn = 'mysql:host=localhost;dbname=database;charset=utf8mb4';
$options = [
    PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8mb4"
];
$pdo = new PDO($dsn, $username, $password, $options);

PHP mysqli:

php
<?php
$mysqli = new mysqli('localhost', 'user', 'password', 'database');
$mysqli->set_charset('utf8mb4');

Node.js:

```javascript const mysql = require('mysql2');

const connection = mysql.createConnection({ host: 'localhost', user: 'user', password: 'password', database: 'database', charset: 'utf8mb4' }); ```

Python:

```python import pymysql

connection = pymysql.connect( host='localhost', user='user', password='password', database='database', charset='utf8mb4mb4' ) ```

Solution 5: Fix Collation Mismatch Errors

```sql -- Check collations involved in error SHOW CREATE TABLE table1; SHOW CREATE TABLE table2;

-- Make collations consistent ALTER TABLE table1 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ALTER TABLE table2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Or use COLLATE clause in query SELECT * FROM table1 t1 JOIN table2 t2 ON t1.name COLLATE utf8mb4_unicode_ci = t2.name; ```

Solution 6: Fix Corrupted Data

For data already corrupted with wrong encoding:

```sql -- Check current encoding SELECT content, HEX(content) FROM posts WHERE id = 1;

-- If data was stored as latin1 but should be utf8mb4 -- Convert binary: ALTER TABLE posts MODIFY content BLOB; ALTER TABLE posts MODIFY content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Or for specific columns UPDATE posts SET content = CONVERT(CAST(CONVERT(content USING latin1) AS BINARY) USING utf8mb4); ```

Solution 7: Handle Import/Export with Correct Encoding

Export with correct encoding:

bash
# Export with utf8mb4
mysqldump -u user -p --default-character-set=utf8mb4 database > backup.sql

Import with correct encoding:

bash
# Import with utf8mb4
mysql -u user -p --default-character-set=utf8mb4 database < backup.sql

During import, verify SQL file encoding:

```bash # Check file encoding file -i backup.sql

# Convert if needed iconv -f ISO-8859-1 -t UTF-8 input.sql > output.sql ```

Verification

Verify Character Set Configuration

```sql -- Should all show utf8mb4 SHOW VARIABLES LIKE 'character_set%';

-- Test emoji support INSERT INTO test_table (content) VALUES ('Hello 😊 World 🌍');

-- Verify storage SELECT content, HEX(content) FROM test_table; ```

Verify All Tables Are utf8mb4

sql
SELECT 
    TABLE_NAME,
    TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_COLLATION NOT LIKE 'utf8mb4%';

Prevention

1. Set Default Character Set

Always specify character set when creating databases and tables:

```sql CREATE DATABASE myapp CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

CREATE TABLE posts ( id INT AUTO_INCREMENT PRIMARY KEY, content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, created_at DATETIME ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ```

2. Configure Application Properly

Always set connection encoding at application startup:

```php // PHP - Set in connection $pdo->exec("SET NAMES utf8mb4");

// Or in PDO options $options[PDO::MYSQL_ATTR_INIT_COMMAND] = "SET NAMES utf8mb4"; ```

3. Document Character Set Requirements

Include character set requirements in: - Database schema documentation - API documentation - Developer onboarding materials

4. Regular Audits

sql
-- Scheduled check for non-utf8mb4 tables
SELECT TABLE_NAME, TABLE_COLLATION
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_COLLATION NOT LIKE 'utf8mb4%';

utf8 vs utf8mb4

Important: MySQL's utf8 is NOT true UTF-8: - utf8 - Supports only 3-byte characters (BMP only) - utf8mb4 - Supports 4-byte characters (full UTF-8 including emojis)

Always use utf8mb4 for proper Unicode support.

  • [MySQL Connection Refused](./fix-mysql-connection-refused)
  • [MySQL Foreign Key Constraint Fails](./fix-mysql-foreign-key-constraint)