OS X - how to calculate a normalized file name

I need to create a mapping between the file names generated by Windows and OS X. I know that OS X "converts all file names to Unicode decomposed" however "most volume formats do not meet the exact specification for these normal forms"

Thus, it is not easy to convert the Windows name to NFD using the standard UTF8 API and be sure that I have the correct OS X name. Is there a way to determine what the actual OS X file name will be without actually creating the file in the file system and then scan the directory to see what was actually created?

+5
unicode utf-8 unicode-normalization macos hfs +
source share
3 answers

You are probably looking for a method -[NSString fileSystemRepresentation] .

Please note that there is no general solution for this task. What is a valid file name depends on the file system of the size that you save. For example, not all file names valid for HFS + are valid for FAT32.

For a Mac, the โ€œstandardโ€ file system (currently HFS +) fileSystemRepresentation should provide what you need; there is no general way for other file systems. Think of those that do not exist, but will be presented in the future, for example :)

+3
source share

I think answer is the TechNote 1150 HFS Plus format:

Note. Mac OS Text Encoding Converter provides several constants that allow you to convert to and from a canonical, decomposed form stored on HFS Plus volumes. When using CreateTextEncoding to create encoding text, you must set the TextEncodingBase to kTextEncodingUnicodeV2_0, set the TextEncodingVariant parameter to kUnicodeCanonicalDecompVariant, and set the TextEncodingFormat parameter to kUnicode16BitFormat. Using these values โ€‹โ€‹ensures that Unicode will be in the same form as in the HFS Plus volume, even in the case of Unicode the standard is evolving.

+2
source share

According to your link, file system drivers look (mostly) following one of two ways: * Return all the names to NFD and, accordingly, convert the names. * Do not carry out conversions.

In both cases, if you create a file in OSX in NFD, reading it on OSX should give you a name in NFD.

OTOH, if your file name comes from Windows -> NFS -> Mac, and you want to do some kind of synchronization, you're out of luck. This is not an easy task, since the main problem is a little philosophical: should the file names be byte or Unicode strings? I believe that Unix has traditionally done the first, and at least on Linux, the NFC UTF-8 names are just a convention.

(It gets worse since IIRC HFS + is used to use Unicode 3.something, so the naive conversion to NFD may not be correct for added / changed characters since the API you are using cannot guarantee a specific version of Unicode.)

0
source share

All Articles