Foreign language signs in regular expression in C #

In C # code, I am trying to pass Chinese characters: " 中文ABC123" .

When I use alphanumeric in general with "^[a-zA-Z0-9\s]+$" ,

it fails for "中文ABC123" and validation fails.

What other expressions do I need to add for C #?

+9
c # regex non-english
source share
2 answers

To match any letter character from any language, use:

 \p{L} 

If you also want to combine numbers:

 [\p{L}\p{Nd}]+ 

\p{L} ... matches the character of a Unicode category letter.
"nbsp; p {Lt} \ p {Lm} \ p {Lo}]
<t23> ... matches lowercase letters. (ABC)
\p{Lu} ... matches uppercase letters. (ABC)
\p{Lt} ... matches the letters of the header.
<t26> ... matches modifier characters.
\p{Lo} ... matches letters without a case. (中文)

\p{Nd} ... matches the decimal digit character in the Unicode category.

Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$

+20
source share

Thanks @ Andie2302 for pointing out the right way to do this.

In addition, for many languages ​​of the world, it still has an “extra character”, which requires the main character to be created (for example, the Thai word “เก็บ”, if only \ p {L} is used, it will only display “เก บ”, You can see that some words will be absent in the word).

That is why only \p{L} will not work for all foreign languages.

So you need to use the code below to support an almost foreign language

 \p{L}\p{M} 

NOTE:

L means "Letter" (all letters in all languages, but not including "Sign")

M stands for "Mark" ("Mark" cannot be displayed separately, it requires "Letter" to display it)

Besides the fact that you need a number, use the code below

 \p{N} 

NOTE:

N stands for "Numeric"


Thanks to this site for very useful information.

https://www.regular-expressions.info/unicode.html

0
source share

All Articles