Unicode file names on FAT-32?

As far as I understand - NTFS supports Unicode file names (UTF-16, as Micorsoft claims?).

However, the official MSDN documentation is very vague as to which codepage is used to store the file names (file paths) on the FAT-32.

It says that the OEM code page (CP437 I assume) is used to store file names: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748.aspx

But here it turns out that there may be different OEM code pages between CP437: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752.aspx

And we all now that utilities such as mount support much more different code pages for FAT, more than just installed OEM code pages.

So what is the actual cdepage for FAT-32 file names? Does it depend on the system code page at the time the FAT was created? Can FAT support true double-byte character encodings like UTF-16? Or are multibyte character encodings like UTF-8 a limit?

And a more specific question: What happens when I use the CreateFileW function (which, according to MSDN, uses UTF-16 as the code page for the file name) to create a file on the FAT-32 volume?

+13
windows winapi unicode codepages fat32
Oct 21 '13 at 20:05
source share
2 answers

You may have to experiment here. This is a great question, and I'm not 100% sure, but:

So what is the actual code page for FAT-32 file names? Does it depend on the system code page at the time the FAT was created?

"OEM code page", whatever that means to the system.

Can FAT support real double-byte character code pages like UTF-16? Or are multibyte character encodings like UTF-8 a limit?

No, I do not think that FAT is directly capable of either UTF-16 or UTF-8. However, Microsoft keeps the Unicode file name out of range. Thus, the file has two file names. (So ​​you can have longer file names with 8.3 names.)

And a more specific question: what happens when I use the CreateFileW function (which, according to MSDN, uses UTF-16 as a code page with the file name) to create a file on the FAT-32 volume?

The Unicode file name passed to CreateFileW is stored directly in the out-of-band file name. It is encoded into the OEM code page (regardless of what happens on the system) and placed there. If it cannot be converted to an OEM code page or exceeds 8.3 characters, Windows will call the file approximately like FILENA~1.TXT .

Some quotes for answers:

Firstly, this page tells us that the OEM codepage! = Windows codepage:

Non-Unicode applications that create FAT files sometimes need to use the standard C runtime library conversion functions to translate between the character set of the Windows code page and the character set of the OEM code page. When implementing Unicode file system functions, there is no need to perform such translations.

In a typical American system, the OEM code page is "CP437" , but the Windows code page is Windows-1252 ( FooA calls, I believe, the use of the Windows code page, usually Windows-1252 on an American machine, but depends on the language).

If you have the available FAT volume, you can see it in action. The Σ symbol (U + 03a3) is missing from Windows-1252, however it is located in CP437. You can see both short and long file names with dir /X With a file named asdfΣ.txt you will see:

 ASDFΣ.TXT asdfΣ.txt 

However, with a file named "asdfΛ.txt" (Λ is not available in CP437 or Windows-1252), you will see:

 ASDF~1.TXT asdf?.txt 

(You will probably see ? Because the font cmd.exe cannot display Λ.)

For information on long file names, see this Wikipedia article .

In addition, it is interesting if you name the file "asdf © .txt", you can get:

 ASDFC.TXT asdfc.txt 

... I'm not 100% sure here, but I think that Windows deftly decided to replace "c" with © and did the same to display it. If you change the font to something that is not based on a raster, such as Consolas, you will see:

 ASDFC.TXT asdf©.txt 

This is why you should use the FooW functions.

+7
Oct 22 '13 at 1:37 on
source share

FAT or FAT32 catalog entries only support short names (old DOS 8.3 format) in the current OEM code page. However, VFAT (FAT with support for long file names), which is used under Windows, can store an additional, so-called long file name for each file, in UTF-16.

+2
Oct 22 '13 at 10:50 on
source share



All Articles