David - Musings of an SRE

Migrating from utf8 to utf8mb4 in MySQL

If you ever get this error message:

  Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row

You probably have experienced at one point, the horrors that is mysql’s utf8 4-byte encoding.

If you’re using MySQL 5.4 and below, there’s no way to solve this without a proper upgrade to at least MySQL 5.5.

My steps:

  1. Ensure I’m using the latest version (MySQL > 5.5)
  2. Setup your /etc/mysql/my.cnf
  # in /etc/mysql/my.cnf add the following in the correct sections
  [client]
  default-character-set = utf8mb4

  [mysqld]
  character-set-client-handshake = FALSE
  character-set-server = utf8mb4
  collation-server = utf8mb4_unicode_ci

  [mysql]
  default-character-set = utf8mb4
  1. If you haven’t setup your database/tables yet, you can skip this step. If you have existing databases that you’d like to migrate over. Follow the steps below:

IMPORTANT: Please ensure you backup your database before performing the following.

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Now all you need to do is to ensure that your application is sending the text in the proper encoding.

In Rails:

  # database.yml
  development:
    adapter: mysql2
    database: db
    username:
    password:
    encoding: utf8mb4
    collation: utf8mb4_unicode_ci
  1. Thats all! You can now safely play with utf8 and utf8mb4.

Here’s a script that you could use to to quickly update your database and tables to the new encoding. As usual, remember to BACKUP first!

require 'rubygems'
require 'sequel'
require 'mysql2'
database = 'your_database_name'
username = 'your_user_name'
password = 'password'
host = 'localhost'

DB = Sequel.connect("mysql2://#{username}:#{password}@#{host}/#{database}")
sw_tables = DB["show tables"].all.map{|x| x.first[1]}
puts DB["ALTER DATABASE #{database} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci"].all.inspect
sw_tables.each do |t|
  puts DB["ALTER TABLE #{t} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"].all.inspect
end
  

Update If you’re working with an existing database and have converted your system to a utf8mb4, there may be instances where you meet with key length issues due to the increased number of bytes that comes with utf8mb4.

To fix this, you might need to add an initializer:

# config/initializer/mysqlpls.rb

require 'active_record/connection_adapters/abstract_mysql_adapter'

module ActiveRecord
  module ConnectionAdapters
    class AbstractMysqlAdapter
      NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
    end
  end
end

References

Emojis and MySQL

How to support full Unicode in MySQL databases