What encoding encodes' ç 'as'? º '(0x3f 0xba)

Today I received a file from a client that I need to read, but it contains strange characters. Using well-known names, I can guess the meaning of some characters.

For instance:

Realname | Encoded as | sign | hex ----------|--------------|-------|------- Françios | Fran?ºios | ç | 3f ba André | Andr?? | é | 3f 3f Hélène | H??l?¿ne | è | 3f bf etc. 
  • I tried all encodings (known .Net) to import the file, and see if they contain words that I know. But no code page gives me satisfaction.
  • Opening a file in Notepad ++ means that it is ANSI and also displays unwanted characters. (But it has a hex-editor plugin that is useful).
  • Other files (from the same user and zip file) are encoded in UTF-8.

From the guy from whom I received the files, I can not count on help. (Using Google Translate), he made it clear that it was very difficult for him to create the files, and he was using software (I think SAP) that I did not have access to.

Is there any other way to find the encoding of the files he just sent me?

+6
encoding globalization codepages
source share
2 answers

I can get these results if I take UTF-8 encoded text, pretend it is CP850 and then convert it to Latin-1 , Windows-1252 or similar encoding. "?" comes from the fact that the CP850 character in 0xc3 is “├”, which does not exist in Latin-1 or derived encoding, so the conversion replaces it with “?”.


Edit: I expanded the search a bit with iconv and CP437 , CP862 , or CP865 better than CP850. As you asked, single-line I used this time:

 for enc in `iconv -l`; do echo -n "$enc: "; echo -n "ç é è" | iconv -s -f $enc -t "LATIN1//TRANSLIT" 2>/dev/null; echo; done 
+6
source share

It must be UTF-8 or UTF-16. they contain almost all regular characters. It looks like you have a problem with decoding / encoding.

notepad ++ can be confusing because your files do not use the order byte sign.

How do you process files?

try reading them as binary, and then try different encodings to get the string. if you do not read them as binary, the default encoding may occur.

"?" is a sign for this.

maybe that helps.

+1
source share

All Articles