Sane (r) way to encode CLI characters on Mac OS X?

I wrote a CLI tool for Mac OS X (10.5+) that should deal with command line arguments, which are likely to contain non-ASCII characters.

For further processing, I convert these arguments using + [NSString stringWithCString: encoding:].

My problem is that I could not find good information on how to determine the character encoding used by the shell in which the specified cli tool works.
What I came up with as a solution is the following:

NSDictionary *environment = [[NSProcessInfo processInfo] environment]; NSString *ianaName = [[environment objectForKey:@"LANG"] pathExtension]; NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding( CFStringConvertIANACharSetNameToEncoding( (CFStringRef)ianaName ) ); NSString *someArgument = [NSString stringWithCString:argv[someIndex] encoding:encoding]; 

I find it a little rude that makes me think that I missed something obvious ... but what?

Is there a cleaner or cleaner way to achieve essentially the same thing?

Thanks in advance

D

+4
source share
3 answers

Well, it looks like they aren't there!

As Yuji pointed out, the main encoding of file names is UTF-8, no matter what. Therefore, it was necessary to process two scenarios :

  • The arguments the user enters are character for character.
  • Arguments that are terminated by tabulation or the output of commands of type ls , since they do not convert any characters.

The second case is simply covered by the assumption of UTF-8.

The first case, however, is problematic:

  • On Mac OS 10.6, $ LANG contains the IANA name of the encoding type de_DE.IANA_NAME .
  • Prior to Snow Leopard, this does not apply to codes other than UTF-8!

I did not test every encoding I could think of, but none of the European ones were included. Instead, $ LANG was a language ( de_DE in my case)!

Since the results of calling +[NSString stringWithCString:encoding:] with the wrong encoding are undefined , you cannot safely assume that it will return nil in this case * (if, for example, it is ASCII-only, it can work fine!).

What adds to the general mess is that $LANG not guaranteed to be around. In any case: a checkbox in the settings of Terminal.app, which allows the user not to set $LANG to everything (not to mention X11.app, which does not seem to handle any input that does not contain ASCII ...).

So what's left:

  • Check for the availability of $LANG . If it is not installed, select: 4!
  • Check if $LANG contains encoding information. If it is not, Goto: 4!
  • Check if there is UTF-8 encoding. If it's Goto: 6, else ...
  • If argc greater than 2 and [[NSString stringWithCString: argv[0] encoding: NSUTF8StringEncoding] isEqualToString: yourForceUTFArgumentFlag] , print that you are now forcing UTF-8 and Goto 6. If not:
  • Suppose you don't know anything, provide a warning that your user should set the terminal encoding to UTF-8 and might consider passing yourForceUTFArgumentFlag as the first argument and exit () .
  • Assume UTF-8 and do what you need ...

Sounds bad? This is because it is, but I cannot think of a way for saner to do this.


One more note: If you use UTF-8 as an encoding, stringWithCString: encoding: returns nil whenever it encounters non-ASCII characters in a C-String that is not encoded in UTF-8.)

0
source

The answer depends on what the dissimilarity comes from.

  • In OS X, the LANG environment variable does not display the language in the graphical interface. Very few people will install LANG on the command line.
  • The choice of "system coding" in the GUI is stored in ~/.CFUserTextEncoding and can be obtained using CFStringGetSystemEncoding , see this Apple doc .
  • However, this “system encoding” is rarely used. , with the exception of very old, non-Unicode software. Any normal Cocoa program uses only Unicode and nothing more.
  • In particular, the Cocoa-level file path is always encoded in (variant) UTF-8. So, to get NSString from string C, use

      NSString*string=[NSString stirngWithCString:cString encoding:NSUTF8Encoding]; 

    and get the C string for the file path from NSString , use

      char*path=[string fileSystemRepresentation]; 

    It is recommended not to use only [string UTF8String] , due to subtlety, see this Apple doc .

  • So, I recommend you not to worry about coding and just assume UTF-8.

  • However, there may be very few people who install LANG on the command line, and you may want to take care of them. Then what you did is the only thing I can come up with.
+1
source

Can't you use [[NSProcessInfo processInfo] arguments] ?

+1
source

Source: https://habr.com/ru/post/1313204/


All Articles