How to check if a string contains Chinese words in Swift?

I want to know how can I check if a string contains Chinese in Swift?

For example, I want to check if there is inside inside:

var myString = "Hi! 大家好!It contains Chinese!"

Thanks!

+5
source share
5 answers

This answer to How to determine if a character is a Chinese character can also be easily translated from Ruby to Swift (now updated for Swift 3):

 extension String { var containsChineseCharacters: Bool { return self.range(of: "\\p{Han}", options: .regularExpression) != nil } } if myString.containsChineseCharacters { print("Contains Chinese") } 

In the regular expression, "\ p {Han}" matches all characters with the "Han" Unicode property, which, as I understand it, is characters from CJK languages.

+6
source

Looking at questions about how to do this in other languages ​​(for example, this accepted answer for Ruby), it seems like a common technique is to determine if a string falls into the CJK range. The ruby ​​response can be adapted to Swift strings as an extension using the following code:

 extension String { var containsChineseCharacters: Bool { return self.unicodeScalars.contains { scalar in let cjkRanges: [ClosedInterval<UInt32>] = [ 0x4E00...0x9FFF, // main block 0x3400...0x4DBF, // extended block A 0x20000...0x2A6DF, // extended block B 0x2A700...0x2B73F, // extended block C ] return cjkRanges.contains { $0.contains(scalar.value) } } } } // true: "Hi! 大家好!It contains Chinese!".containsChineseCharacters // false: "Hello, world!".containsChineseCharacters 

Ranges may already exist in the Foundation somewhere, rather than manually hard-coded them.

Above for Swift 2.0, for earlier use you will need to use the free contains function, not the protocol extension (twice):

 extension String { var containsChineseCharacters: Bool { return contains(self.unicodeScalars) { // older version of compiler seems to need extra help with type inference (scalar: UnicodeScalar)->Bool in let cjkRanges: [ClosedInterval<UInt32>] = [ 0x4E00...0x9FFF, // main block 0x3400...0x4DBF, // extended block A 0x20000...0x2A6DF, // extended block B 0x2A700...0x2B73F, // extended block C ] return contains(cjkRanges) { $0.contains(scalar.value) } } } } 
+5
source

Try this in Swift 2:

 var myString = "Hi! 大家好!It contains Chinese!" var a = false for c in myString.characters { let cs = String(c) a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false)) } print("\(myString) contains Chinese characters = \(a)") 
+2
source

The accepted answer will only find a string containing a Chinese character, I created one costume for my own case:

 enum ChineseRange { case notFound, contain, all } extension String { var findChineseCharacters: ChineseRange { guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else { return .notFound } var result: ChineseRange switch a { case nil: result = .notFound case self.startIndex..<self.endIndex: result = .all default: result = .contain } return result } } if "你好".findChineseCharacters == .all { print("All Chinese") } if "Chinese".findChineseCharacters == .notFound { print("Not found Chinese") } if "Chinese你好".findChineseCharacters == .contain { print("Contains Chinese") } 

gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a

+1
source

I created a Swift 3 String extension to check the number of Chinese characters contained in a String. Similar to Airspeed Velocity code, but more comprehensive. Checking different Unicode ranges to see if the character is Chinese. See the Chinese character ranges listed in the tables in section 18.1 in the standard Unicode specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf

The line extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift

Usage example:

 let myString = "Hi! 大家好!It contains Chinese!" let chinesePercentage = myString.chinesePercentage() let chineseCharacterCount = myString.chineseCharactersCount() print("String contains \(chinesePercentage) percent Chinese. That \(chineseCharacterCount) characters.") 
0
source

All Articles