How to change character encoding in postgres database?

I have a database that has been configured with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do this?

+62
postgresql unicode
Feb 23 '11 at 12:22
source share
5 answers

To change the encoding of your database:

  • Dump your database
  • Drop your database
  • Create a new database with different coding
  • Reload the data.

Make sure client encoding is set correctly during all of this.

Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php

+47
Feb 23 '11 at 12:28
source share

First of all, Daniel's answer is the correct, safe option.

For the specific case of moving from SQL_ASCII to something else, you can trick and simply drag the pg_database directory to reassign the encoding of the database. It is assumed that you have already saved any non-ASCII characters in the expected encoding (or that you simply did not use non-ASCII characters).

Then you can do:

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb' 

This will not change the sorting of the database, as encoded bytes are converted to characters (so now length('Β£123') will return 4 instead of 5). If the database uses collation "C", there should be no change in order for ASCII strings. You will probably need to rebuild any indexes containing non-ASCII characters.

Caution emptor. Dumping and rebooting makes it possible to check the contents of your database in fact in the encoding that you expect, but this is not so. And if it turns out that the database had erroneously encoded data, the salvation will be difficult. Therefore, if possible, you can reset and reinitialize.

+79
Feb 23 '11 at 12:45
source share

Resetting a database with a specific encoding and attempting to restore it to another database with a different encoding can lead to data corruption. Data encoding must be set before any data is inserted into the database.

Check this : When copying any other database, the encoding and locale parameters cannot be changed from the settings of the original database, as this can lead to data corruption.

And this : Some databases must have their own values ​​when creating the database. You can use different settings for different databases, but after creating the database, you can no longer change them for this database. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be fixed, or indexes on text columns become damaged. (But you can mitigate this limitation by using mappings, as described in Section 22.2.) The default values ​​for these categories are determined when initdb is run, and these values ​​are used when creating new databases, unless otherwise specified in the CREATE DATABASE command.




I would rather rebuild everything from the start with the correct local encoding on your debian OS, as described here :

 su root 

Reconfigure your local settings:

 dpkg-reconfigure locales 

Choose your language (for example, for French in Switzerland: fr_CH.UTF8)

Remove and clean postgresql correctly:

 apt-get --purge remove postgresql\* rm -r /etc/postgresql/ rm -r /etc/postgresql-common/ rm -r /var/lib/postgresql/ userdel -r postgres groupdel postgres 

Reinstall postgresql:

 aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1 

Now any new database will be automatically created with the correct encoding, LC_TYPE (character classification) and LC_COLLATE (string sorting order).

+8
Dec 31 '14 at 12:30
source share

Daniel Kutik's answer is correct, but it can be even safer with renaming the database .

So, a really safe way:

  • Create a new database with a different encoding and name
  • Dump your database
  • Restore dump to new database
  • Verify that your application is working correctly with the new DB
  • Rename the old DB to something meaningful.
  • Rename a new database
  • Re-testing the application
  • Discard Old Database

In case of emergency just rename DBs back

+5
Feb 12 '17 at 2:01
source share
 # dump into file pg_dump myDB > /tmp/myDB.sql # create an empty db with the right encoding (on older versions the escaped single quotes are needed!) psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;' # import in the new DB psql -d tempDB -1 -f /tmp/myDB.sql # rename databases psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB-wrong-encoding";' psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";' # see the result psql myDB -c "SHOW LC_COLLATE" 
0
Jul 09 '18 at 23:34
source share



All Articles