Decoding quoted messages in Swift

I have a line with quotation marks for printing, such as "Cost will be = C2 = A31,000". How do I convert this to "The cost is £ 1,000."

I just convert the text manually at the moment, and this does not apply to all cases. I am sure that there is only one line of code that will help with this.

Here is my code:

func decodeUTF8(message: String) -> String { var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil) return newMessage } 

thanks

+4
swift utf-8 macos quoted-printable
source share
5 answers

An easy way is to use the (NS)String method (NS)String for this purpose stringByRemovingPercentEncoding . This was observed when decoding the cited printed forms , so the first solution is mainly to translate the answers in this thread to Swift.

The idea is to replace the encoding "= NN" in quotation marks with the percent encoding "% NN", and then use the existing method to remove the percent encoding.

Continuation lines are processed separately. In addition, percent characters in the input string must be encoded first, otherwise they will be treated as the leading character in percent encoding.

 func decodeQuotedPrintable(message : String) -> String? { return message .stringByReplacingOccurrencesOfString("=\r\n", withString: "") .stringByReplacingOccurrencesOfString("=\n", withString: "") .stringByReplacingOccurrencesOfString("%", withString: "%25") .stringByReplacingOccurrencesOfString("=", withString: "%") .stringByRemovingPercentEncoding } 

The function returns an optional string, which is nil for invalid input. Incorrect input can be:

  • The character "=" followed by two hexadecimal digits, for example, "= XX".
  • The sequence "= NN" that is not decoded into a valid UTF-8 sequence, for example, "= E2 = 64".

Examples:

 if let decoded = decodeQuotedPrintable("=C2=A31,000") { print(decoded) // £1,000 } if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") { print(decoded) // "Hello … world!" } 

Update 1: The code above assumes that the message uses UTF-8 encoding for quotes of non-ASCII characters, as in most of your examples: C2 A3 is UTF-8 encoding for “£”, E2 80 A4 is UTF - 8 encoding for

If the input is "Rub=E9n" then the message uses the encoding Windows-1252 . To decode it correctly, you need to replace

 .stringByRemovingPercentEncoding 

from

 .stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding) 

There are also ways to detect the encoding from the "Content-Type" header field, for example, https://stackoverflow.com/a/167958/ .


Update 2: the stringByReplacingPercentEscapesUsingEncoding method stringByReplacingPercentEscapesUsingEncoding marked obsolete, so the code above will always generate a compiler warning. Unfortunately, Apple doesn't seem to have provided an alternative method.

So, here is a new, completely standalone decoding method that does not cause any compiler warnings. This time I wrote it as an extension method for String . Explanatory comments are in the code.

 extension String { /// Returns a new string made by removing in the 'String' all "soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or 'nil' for invalid input. func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .stringByReplacingOccurrencesOfString("=\r\n", withString: "") .stringByReplacingOccurrencesOfString("=\n", withString: "") .decodeQuotedPrintableSequences(enc) } /// Helper function doing the real work. /// Decode all "=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? { var result = "" var position = startIndex // Find the next "=" and copy characters preceding it to the result: while let range = rangeOfString("=", range: position ..< endIndex) { result.appendContentsOf(self[position ..< range.startIndex]) position = range.startIndex // Decode one or more successive "=HH" sequences to a byte array: let bytes = NSMutableData() repeat { let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)] if hexCode.characters.count < 2 { return nil // Incomplete hex code } guard var byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.appendBytes(&byte, length: 1) position = position.advancedBy(3) } while position != endIndex && self[position] == "=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.appendContentsOf(dec) } // Copy remaining characters to the result: result.appendContentsOf(self[position ..< endIndex]) return result } } 

Usage example:

 if let decoded = "=C2=A31,000".decodeQuotedPrintable() { print(decoded) // £1,000 } if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // "Hello … world!" } if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) { print(decoded) // Rubén } 

Update for Swift 4 (and later):

 extension String { /// Returns a new string made by removing in the 'String' all "soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or 'nil' for invalid input. func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .replacingOccurrences(of: "=\r\n", with: "") .replacingOccurrences(of: "=\n", with: "") .decodeQuotedPrintableSequences(encoding: enc) } /// Helper function doing the real work. /// Decode all "=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? { var result = "" var position = startIndex // Find the next "=" and copy characters preceding it to the result: while let range = range(of: "=", range: position..<endIndex) { result.append(contentsOf: self[position ..< range.lowerBound]) position = range.lowerBound // Decode one or more successive "=HH" sequences to a byte array: var bytes = Data() repeat { let hexCode = self[position...].dropFirst().prefix(2) if hexCode.count < 2 { return nil // Incomplete hex code } guard let byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.append(byte) position = index(position, offsetBy: 3) } while position != endIndex && self[position] == "=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.append(contentsOf: dec) } // Copy remaining characters to the result: result.append(contentsOf: self[position ..< endIndex]) return result } } 

Usage example:

 if let decoded = "=C2=A31,000".decodeQuotedPrintable() { print(decoded) // £1,000 } if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // "Hello … world!" } if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) { print(decoded) // Rubén } 
+4
source share

Unfortunately, I was a bit late with the answer. This may be helpful to others.

 var string = "The cost would be =C2=A31,000" var finalString: String? = nil if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive) { let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%$1") print(intermediatePercentEscapedString) finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding print(finalString) } 
+1
source share

This encoding is called "quoted-printable", and you need to convert the string to NSData using ASCII encoding, and then just iterate over the data, replacing all three character sides, such as "= A3" with the / w 980> 0xA3 byte, and then converting the received data to a string using NSUTF8StringEncoding.

0
source share

To provide an applicable solution, some more information is required. So, I will make some assumptions.

For example, in an HTML or Mail message, you can apply one or more encodings to some source data. For example, you can encode a binary file, for example. base png file 64, and then zip it. Order is important.

In your example, as you say, the source data is a string and has been encoded through UTF-8.

In the HTPP message, your Content-Type thus text/plain; charset = UTF-8 text/plain; charset = UTF-8 . In your example, there is also an additional encoding, " Content-transfer-encoding ": perhaps Content-transfer-encoding is quoted-printable or base64 (but not sure about that).

To get it back, you need to apply the appropriate decoding in the reverse order.

Hint:

You can view the headers ( Contente-type and Content-transfer-encoding ) of the mail message while viewing the source mail source.

0
source share

You can also see this working solution - https://github.com/dunkelstern/QuotedPrintable

 let result = QuotedPrintable.decode(string: quoted) 
0
source share

All Articles