Iconv unicode unknown input format

Question

Iconv unicode unknown input format

I have a file that is described on Unix as:

$file xxx.csv xxx.csv: UTF-8 Unicode text, with very long lines

A look in less / vi will make some special characters (ßÄ ° ...) unreadable (├╝); Windows also will not display it; When importing it directly into db, just change the special characters to some other special characters (+ ä, + ñ, ...).

I would like to convert it now to a “standard readable” encoding using iconv. When I try to convert it with iconv

 $iconv -f UTF-8 -t ISO-8859-1 xxx.csv > yyy.csv iconv: illegal input sequence at position 1234

using UNICODE as input and UTF-8 as output, the same message will be returned

I assume the file is somewhat encoded in a different format that I don’t know - how can I find out which format to convert it to something “universally” readable ...

+7

unix encoding utf-8

RRZ Europe Oct 7 '11 at 14:12

source share

3 answers

Converting from UTF-8 to ISO-8859-1 only works if your UTF-8 text has only characters that can be represented in ISO-8859-1. If this is not the case, you should indicate what should happen to these characters, ignoring (// IGNORE) or approximating (// TRANSLIT) them. Try one of the following:

 iconv -f UTF-8 -t ISO-8859-1//IGNORE --output=outfile.csv inputfile.csv iconv -f UTF-8 -t ISO-8859-1//TRANSLIT --output=outfile.csv inputfile.csv

In most cases, I think approximation is the best solution, like matching. accented characters for their uncoordinated colleagues, euro sign in EUR, etc.

+8

niefpaarschoenen Sep 23 '13 at 12:44

source share

If you are not sure about the type of file you are dealing with, you can find it as follows:

 file file_name

The above command will give you the file format. Then you can use iconv . For example, if the file format is UTF-16 , and you want to convert it to UTF-8 , then you can use the following.

 iconv -f UTF-16 -t UTF-8 file_name >output_file_name

We hope this provides further information on what you are looking for.

+1

Mari Aug 29 '13 at 9:51

source share

RRZ Europe · Accepted Answer · 2011-10-12T07:56:03+0000

The problem was that Windows could not interpret the file as UTF-8 on its own. he reads it as asci, and then ä becomes an interpretation of 2 characters Ã¤ (ascii 195 164)

trying to convert it, I found a solution that works for me:

 iconv -f UTF-8 -t WINDOWS-1252//TRANSLIT --output=outfile.csv inputfile.csv

Now I can correctly view special characters in editors

To compile SQLServer, converting UTF-8 to UTF-16 will work even better ... just the file size grows a bit.

Iconv unicode unknown input format

More articles: