For some unknown reason, the default character set in Debian 7 (“Wheezy”) is different to Ubuntu 12.04 (“Precise Pangolin”). The former uses latin1, while the later utf8. (This could also be down to other local configuration I’m not aware of). The difference between the two is rather subtle, and might go unnoticed. One visible effect is when using letters with German umlaut or Scandinavian specific letters in ordered text. For example, the Norwegian letter Å (aring in HTML) is the last of the alphabet, but when using latin1, it is interpreted as an A (or possibly double a: aa) and ordered first. See this bug report and discussion for more details and examples.

The character set is specified in multiple places in the MySQL DBMS: On the database, table and for the client, server, connection, result set and underlying system. To view the current settings, these two commands give an overview (see here for details):

SELECT * FROM information_schema.SCHEMATA;

The output will be something like:

| def          | information_schema | utf8                       | utf8_general_ci        | NULL     |
| def          | test               | latin1                     | latin1_swedish_ci      | NULL     |
| def          | myTESTdb           | latin1                     | latin1_swedish_ci      | NULL     |
| Variable_name            | Value                      |
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |

The above is from the Ubuntu 12.04 system where there were no problems with the character set of text values in the tables. Notice that are some latin1 settings there, but that seems to be fine. The difference on the Debian 7.0 system was that these variable were set to latin1: character_set_client, character_set_connection, character_set_results. One way to change them is through the SET commands in MySQL console:

SET character_set_connection = utf8;

However, that will not persist the settings. Furthermore, it will not be enough to fix the tables. Instead, the config file /etc/mysql/my.cnf has to be changed with the following lines. (from this discussion).

collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

Finally, after that is done, the MySQL server restarted, and the data with the special UTF8 characters have to be re-inserted. That is a bit of a pain, and I did not dig too far into how to resolve a problem with existing that. There is a discussion on how to alter the DB and tables here, but I suspect the data will still contain the wrong characters, so it will not be enough. Fundamentally, the byte representation in lation1 and utf8 for these letters are different, so that has the be replaced somehow, as far as I understand.

An alternative to all this is to get it right from the get-go. The MySQL documentation outlines how a new DB can be created with a specific character set. Again I’m not sure if that is enough, as the settings in the my.cnf file deal with the encoding of the client, connection, and result. There are probably several ways to solve the issue, though.