Copying copyright / registered characters does not work

Ive developed an iOS application where we can send emojis from iOS to a web portal and vice versa. All emojis sent from iOS to the web portal look great except for "© and ®".

Here is a snippet of emoji coding code.

NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding]; NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]; 

// This piece of code returns \251\256 as Unicode copyright and registered emojis, since these two Unicode are not standard, so it does not appear on the web portal.

So what should I do to convert them to standard Unicodes?

Test:

 messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®"; 

// The encoded string that I get from the above encoding,

 Copy right symbol : \\251 AND Registered Mark symbol : \\256 

Where it should like (In standard Unicode)

 Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE 
+7
ios unicode
source share
3 answers

First I will try to provide a solution. Then I will try to explain why.

Escaping non-ASCII characters

To avoid unicode characters in a string, you should not rely on NSNonLossyASCIIStringEncoding . Below is the code that I use to remove unicode & non-ASCII characters in a string:

 // NSMutableString category - (void)appendChar:(unichar)charToAppend { [self appendFormat:@"%C", charToAppend]; } // NSString category - (NSString *)UEscapedString { char const hexChar[] = "0123456789ABCDEF"; NSMutableString *outputString = [NSMutableString string]; for (NSInteger i = 0; i < self.length; i++) { unichar character = [self characterAtIndex:i]; if ((character >> 7) > 0) { [outputString appendString:@"\\u"]; [outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits [outputString appendChar:(hexChar[(character >> 8) & 0xF])]; // hex for the second group of 4-bits from the left [outputString appendChar:(hexChar[(character >> 4) & 0xF])]; // hex for the third group [outputString appendChar:(hexChar[character & 0xF])]; // hex for the last group, eg, the right most 4-bits } else { [outputString appendChar:character]; } } return [outputString copy]; } 

( NOTE: I think the Jon Rose method does the same, but I do not want to use a method that I have not tested)

Now you have the following line: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE

Unicode escaping is in progress. Now bring it back to display emojis.

Conversion back

This is confusing at first, but here is what it is:

 NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding]; NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding]; 

Now you have your emojis (and other non-ASCII).

What's happening?

Problem

In your case, you are trying to create a common language between the server and your application. However, NSNonLossyASCIIStringEncoding is a pretty poor choice for this purpose. Because this is a black box created by Apple, and we really don’t know what exactly it does inside. As we can see, it converts unicode to \uXXXX , converting non-ASCII characters to \XXX . That is why you should not rely on this to build a multi-platform system. There is no equivalent on backend platforms and Android.

However, it is rather cryptic that NSNonLossyASCIIStringEncoding can convert back to ® from \u00AE , while it converts it to \256 in the first place. I am sure that on other platforms there are tools to convert \uXXXX to unicode characters, this should not be a problem for you.

+5
source share

messageBody is a string in which there is no need to convert it to data just to convert it to a string. Change the code to

 NSString *encodedString = messageBody; 

If the messageBody object messageBody incorrect, then the way to fix it is to change the way it is created. The server sends data, not rows. The data sent by the server is encoded in some agreed order. This is usually UTF-8 encoding. If you know the encoding, you can convert the data to a string; if you do not, then the data is gibberish that cannot be read. If the messageBody parameter messageBody incorrect, there was a problem when it was converted from data sent by the server. It seems likely that you are parsing it with the wrong encoding.

The code you posted is simply incorrect. It converts a string to data using one encoding (ASCII) and reads data with a different encoding (UTF8). This is similar to translating a book into Spanish and then a Portuguese translator - this may work for some words, but it is still wrong.

If you still have problems, you should share the code where messageBody is created.

If the server expects the ASCII string with all Unicode characters to be changed to \ u00xx, then you must first yell at your server guy because he is an idiot. But if that doesn't work, you can do the following code

 NSString* messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®"; NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding]; uint32_t *bytes = (uint32_t *) [utf32Data bytes]; NSMutableString* escapedString = [[NSMutableString alloc] init]; //Start a 1 because first bytes are for endianness for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){ uint32_t charValue = bytes[index]; if (charValue <= 127) { [escapedString appendFormat:@"%C", (unichar)charValue]; }else{ [escapedString appendFormat:@"\\\\u%04X", charValue]; } } 
+5
source share

I really don't understand your problem.

You can simply convert the ANY character to nsdata and return it to a string. You can simply pass a UTF-8 string, including both emoji and other characters, using a POST request.

 NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding]; NSData* data = [newStr dataUsingEncoding:NSUTF8StringEncoding]; 

It should work both on the server side and on the client side.

But, of course, you have another problem that some fonts do not support allutf-8 characters. That is why, for example, in the terminal you may not see some of them. But it depends on the volume of this issue.

NSNonLossyASCIIStringEncoding used only when you really want to convert a character to a character string. But it is not necessary.

0
source share

All Articles