Foreign language signs in regular expression in C #

Question

Foreign language signs in regular expression in C #

In C # code, I am trying to pass Chinese characters: " 中文ABC123" .

When I use alphanumeric in general with "^[a-zA-Z0-9\s]+$" ,

it fails for "中文ABC123" and validation fails.

What other expressions do I need to add for C #?

+9

c # regex non-english

user2683269 Jan 26 '15 at 18:54

source share

2 answers

Thanks @ Andie2302 for pointing out the right way to do this.

In addition, for many languages of the world, it still has an “extra character”, which requires the main character to be created (for example, the Thai word “เก็บ”, if only \ p {L} is used, it will only display “เก บ”, You can see that some words will be absent in the word).

That is why only \p{L} will not work for all foreign languages.

So you need to use the code below to support an almost foreign language

 \p{L}\p{M}

NOTE:

L means "Letter" (all letters in all languages, but not including "Sign")

M stands for "Mark" ("Mark" cannot be displayed separately, it requires "Letter" to display it)

Besides the fact that you need a number, use the code below

 \p{N}

NOTE:

N stands for "Numeric"

Thanks to this site for very useful information.

https://www.regular-expressions.info/unicode.html

0

Sruit A.Suk Jun 14 '19 at 18:55

source share

Andie2302 · Accepted Answer · 2015-01-26T18:55:54+0000

To match any letter character from any language, use:

 \p{L}

If you also want to combine numbers:

 [\p{L}\p{Nd}]+

\p{L} ... matches the character of a Unicode category letter.
"nbsp; p {Lt} \ p {Lm} \ p {Lo}]
<t23> ... matches lowercase letters. (ABC)
\p{Lu} ... matches uppercase letters. (ABC)
\p{Lt} ... matches the letters of the header.
<t26> ... matches modifier characters.
\p{Lo} ... matches letters without a case. (中文)

\p{Nd} ... matches the decimal digit character in the Unicode category.

Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$

Foreign language signs in regular expression in C #

More articles: