Migrating from utf8 to utf8mb4 in MySQL
If you ever get this error message:
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row
You probably have experienced at one point, the horrors that is mysql’s utf8 4-byte encoding.
If you’re using MySQL 5.4 and below, there’s no way to solve this without a proper upgrade to at least MySQL 5.5.
My steps:
- Ensure I’m using the latest version (MySQL > 5.5)
- Setup your /etc/mysql/my.cnf
# in /etc/mysql/my.cnf add the following in the correct sections
[client]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
[mysql]
default-character-set = utf8mb4
- If you haven’t setup your database/tables yet, you can skip this step. If you have existing databases that you’d like to migrate over. Follow the steps below:
IMPORTANT: Please ensure you backup your database before performing the following.
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
- Now all you need to do is to ensure that your application is sending the text in the proper encoding.
In Rails:
# database.yml
development:
adapter: mysql2
database: db
username:
password:
encoding: utf8mb4
collation: utf8mb4_unicode_ci
- Thats all! You can now safely play with utf8 and utf8mb4.
Here’s a script that you could use to to quickly update your database and tables to the new encoding. As usual, remember to BACKUP first!
require 'rubygems'
require 'sequel'
require 'mysql2'
database = 'your_database_name'
username = 'your_user_name'
password = 'password'
host = 'localhost'
DB = Sequel.connect("mysql2://#{username}:#{password}@#{host}/#{database}")
sw_tables = DB["show tables"].all.map{|x| x.first[1]}
puts DB["ALTER DATABASE #{database} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci"].all.inspect
sw_tables.each do |t|
puts DB["ALTER TABLE #{t} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"].all.inspect
end
Update If you’re working with an existing database and have converted your system to a utf8mb4, there may be instances where you meet with key length issues due to the increased number of bytes that comes with utf8mb4.
To fix this, you might need to add an initializer:
# config/initializer/mysqlpls.rb
require 'active_record/connection_adapters/abstract_mysql_adapter'
module ActiveRecord
module ConnectionAdapters
class AbstractMysqlAdapter
NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
end
end
end
References