Replace Unicode characters in PostgreSQL

Is it possible to replace all occurrences of a given character (expressed in unicode) with another character (expressed in unicode) in the varchar field in PostgreSQL?

I tried something like this:

UPDATE mytable SET myfield = regexp_replace(myfield, '\u0050', '\u0060', 'g') 

But it seems that he really writes the string '\ u0060' in the field, and not the character corresponding to this code.

+6
source share
2 answers

According to the PostgreSQL lexical documentation , you should use the U& syntax:

 UPDATE mytable SET myfield = regexp_replace(myfield, U&'\0050', U&'\0060', 'g') 

You can also use the escape line form specific to PostgreSQL E'\u0050' . This will work in older versions than in the unicode escape form, but the unicode deletion form is preferred for newer versions. This should show what happens:

 regress=> SELECT '\u0050', E'\u0050', U&'\0050'; ?column? | ?column? | ?column? ----------+----------+---------- \u0050 | P | P (1 row) 
+11
source

It should work with โ€œcharacters matching this code,โ€ unless the client or another layer in the product chain fails your code!

Alternatively, use translate() or replace() for this simple job. Much faster than regexp_replace() . translate() also good for a few simple replacements at a time.
And avoid empty updates with a WHERE . Much faster and avoids the table boat and the extra cost of VACUUM .

 UPDATE mytable SET myfield = translate(myfield, 'P', '`') -- actual characters WHERE myfield <> translate(myfield, 'P', '`'); 

If you continue to encounter problems, use @mvp encoding, provided:

 UPDATE mytable SET myfield = translate(myfield, U&'\0050', U&'\0060') WHERE myfield <> translate(myfield, U&'\0050', U&'\0060'); 
+3
source

All Articles