Fixing MySQL Database After Moving to Ruby 1.9

This is a short post describing how I managed to revive the database full of content in national character set after moving to Ruby 1.9.2.

Ruby1.9 is known for many things – reduced memory footprint, increased performance, friendlier syntax – but it also introduced a long-awaited concept of encodings. The basis of this is encoding-aware strings.

Historically, I stored data in MySQL database with latin1 set as tables collation. Technically, the data was written in utf8, but as long as you read the same way as you write, you should be golden. After switching to Ruby 1.9.2, my Rails application started to die with the message:

incompatible character encodings: UTF-8 and ASCII-8BIT

What it means is that it’s trying to mix in content in two incompatible character sets and doesn’t know how to handle this. My bet was on that the content from the database came in as latin1 even though I explicitly mentioned encoding: utf8 in database.yml.

After several hours of playing with configuration of both Rails app and MySQL, I figured the way:

  • First, I dumped the database to an SQL file explicitly instructing to use latin1 (or otherwise it would use utf8):

      mysqldump --set-character-set=latin1 database > database.sql
  • Opened it in the text editor and verified that I can read national stuff

  • Replaced every mention of latin1 with utf8 in the SQL file
  • Loaded the database back from SQL file by mentioning latin1 again:

      mysql --set-character-set=latin1 database < database.sql

This gave me the same database, but correctly encoded in utf8 now.