What is narrow-string encoding in Windows?

Question

What is narrow-string encoding in Windows?

The Subversion API has a number of functions for converting from naturally encoded strings to strings encoded in UTF-8. My question is: what is this internal encoding on Windows? Does it depend on the language?

+8

c ++ c string winapi character-encoding

Daniel Trebbien Jan 10 '11 at 17:09

source share

4 answers

Windows 1252 Jukka Korpela has an excellent character encoding page , with a wide discussion of the Windows character set.

+2

Emeryberry Jan 10 '11 at 17:22

source share

From the svn_string.h header, you can see that the corresponding svn_strings are just the old const char * + length element.

I would suggest that lines with originally encoded svn are interpreted according to your system language (I do not know for sure, but this is an agreement). In Windows 7, you can check your language by choosing "Start → Control Panel → Region and Language → Administrative → Change System Language", where any value of the English language probably entails the character encoding of Windows 1252. However, another language standard of the system, for example, Hebrew (Israel), will entail a different character encoding (Windows 1255 for Hebrew).

+2

hillel Jan 10 '11 at 17:38

source share

Unfortunately, the CVC library for MSVC does not support UTF-8 and uses only outdated code pages, but cygwin provides the UTF-8 locale as part of its emulation level. If your svn is built on cygwin, you should be able to use UTF-8 just fine.

+1

R .. Jan 10 '11 at 18:09

source share

user257111 · Accepted Answer · 2011-01-10T17:20:09+0000

"Natively encoded" lines are lines written on any page that is used by the user. That is, these are numbers that are translated into the corresponding glyphs based on the correct code page. Assuming the file was saved this way and not as a UTF-8 file.

This is a candidate question for Joel's Unicode article .

In particular:

In the end, this free OEM is ANSI encoded. in the ANSI Standard, everyone agreed on what to do below 128, which was pretty much the same as ASCII, but there were many different ways of handling characters from 128 and above, depending on where you lived. These different systems were called page code. For example, in Israel, DOS used a code page called 862, while Greek users used 737. These were the same below 128, but different from 128 up, where all the funny letters resided. National versions of MS-DOS had dozens of these code pages, processing from English to Icelandic, and they even had several "multilingual" code pages that could be Esperanto and Galician on the same computer! Wow! But, having received, say, Hebrew and Greek on the same computer, it was completely impossible if you had written your own program that displays everything using a bitmap graphic, because Hebrew and Greek required different code pages with different interpretations of high numbers.

What is narrow-string encoding in Windows?

More articles: