Perl: Decoding Unicode "Distorted" Strings

Question

Perl: Decoding Unicode "Distorted" Strings

I am working on a CGI script that is being called from a piece of software (which I cannot change). The variables provided by the software give me problems, because if they contain non-ascii characters, they look like this:

ÿFFFFDEetta er texti meÿFFFFF0 ÿFFFFEDslenskum stÿFFFFF6fum

instead

Þetta er texti með íslenskum stöfum .

I tried messing around with the Encode::decode , but nothing came of it - all I have to do is change the way is presented.

So yes, I'm a bit stumped. What should I do to change all ÿFFFFDE to Þ , etc., without resorting to replacing each character without ascii separately (which is not a solution, because it should work in languages that I don’t even speak)?

+4

perl unicode

Swooper Sep 28 '11 at 12:13

source share

1 answer

daxim · Accepted Answer · 2011-09-28T13:22:30+0000

 use Encode qw(decode); use Encode::Escape qw(); $_ = 'ÿFFFFDEetta er texti meÿFFFFF0 ÿFFFFEDslenskum stÿFFFFF6fum'; s/ÿFFFF/\\x/g; decode('iso-8859-1', decode('unicode-escape', $_)); # returns 'Þetta er texti með íslenskum stöfum'

Perl: Decoding Unicode "Distorted" Strings

More articles: