An easy way is to use the (NS)String method (NS)String for this purpose stringByRemovingPercentEncoding . This was observed when decoding the cited printed forms , so the first solution is mainly to translate the answers in this thread to Swift.
The idea is to replace the encoding "= NN" in quotation marks with the percent encoding "% NN", and then use the existing method to remove the percent encoding.
Continuation lines are processed separately. In addition, percent characters in the input string must be encoded first, otherwise they will be treated as the leading character in percent encoding.
func decodeQuotedPrintable(message : String) -> String? { return message .stringByReplacingOccurrencesOfString("=\r\n", withString: "") .stringByReplacingOccurrencesOfString("=\n", withString: "") .stringByReplacingOccurrencesOfString("%", withString: "%25") .stringByReplacingOccurrencesOfString("=", withString: "%") .stringByRemovingPercentEncoding }
The function returns an optional string, which is nil for invalid input. Incorrect input can be:
- The character "=" followed by two hexadecimal digits, for example, "= XX".
- The sequence "= NN" that is not decoded into a valid UTF-8 sequence, for example, "= E2 = 64".
Examples:
if let decoded = decodeQuotedPrintable("=C2=A31,000") { print(decoded) // £1,000 } if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") { print(decoded) // "Hello … world!" }
Update 1: The code above assumes that the message uses UTF-8 encoding for quotes of non-ASCII characters, as in most of your examples: C2 A3 is UTF-8 encoding for “£”, E2 80 A4 is UTF - 8 encoding for …
If the input is "Rub=E9n" then the message uses the encoding Windows-1252 . To decode it correctly, you need to replace
.stringByRemovingPercentEncoding
from
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)
There are also ways to detect the encoding from the "Content-Type" header field, for example, https://stackoverflow.com/a/167958/ .
Update 2: the stringByReplacingPercentEscapesUsingEncoding method stringByReplacingPercentEscapesUsingEncoding marked obsolete, so the code above will always generate a compiler warning. Unfortunately, Apple doesn't seem to have provided an alternative method.
So, here is a new, completely standalone decoding method that does not cause any compiler warnings. This time I wrote it as an extension method for String . Explanatory comments are in the code.
extension String { /// Returns a new string made by removing in the 'String' all "soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or 'nil' for invalid input. func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .stringByReplacingOccurrencesOfString("=\r\n", withString: "") .stringByReplacingOccurrencesOfString("=\n", withString: "") .decodeQuotedPrintableSequences(enc) } /// Helper function doing the real work. /// Decode all "=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? { var result = "" var position = startIndex // Find the next "=" and copy characters preceding it to the result: while let range = rangeOfString("=", range: position ..< endIndex) { result.appendContentsOf(self[position ..< range.startIndex]) position = range.startIndex // Decode one or more successive "=HH" sequences to a byte array: let bytes = NSMutableData() repeat { let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)] if hexCode.characters.count < 2 { return nil // Incomplete hex code } guard var byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.appendBytes(&byte, length: 1) position = position.advancedBy(3) } while position != endIndex && self[position] == "=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.appendContentsOf(dec) } // Copy remaining characters to the result: result.appendContentsOf(self[position ..< endIndex]) return result } }
Usage example:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() { print(decoded) // £1,000 } if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // "Hello … world!" } if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) { print(decoded) // Rubén }
Update for Swift 4 (and later):
extension String { /// Returns a new string made by removing in the 'String' all "soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or 'nil' for invalid input. func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .replacingOccurrences(of: "=\r\n", with: "") .replacingOccurrences(of: "=\n", with: "") .decodeQuotedPrintableSequences(encoding: enc) } /// Helper function doing the real work. /// Decode all "=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? { var result = "" var position = startIndex // Find the next "=" and copy characters preceding it to the result: while let range = range(of: "=", range: position..<endIndex) { result.append(contentsOf: self[position ..< range.lowerBound]) position = range.lowerBound // Decode one or more successive "=HH" sequences to a byte array: var bytes = Data() repeat { let hexCode = self[position...].dropFirst().prefix(2) if hexCode.count < 2 { return nil // Incomplete hex code } guard let byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.append(byte) position = index(position, offsetBy: 3) } while position != endIndex && self[position] == "=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.append(contentsOf: dec) } // Copy remaining characters to the result: result.append(contentsOf: self[position ..< endIndex]) return result } }
Usage example:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() { print(decoded) // £1,000 } if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // "Hello … world!" } if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) { print(decoded) // Rubén }