Greek characters, regular expressions and C #

I create a CMS for a scientific journal and use a lot of Greek characters. I need to check a field to include a specific set of characters and Greek characters. Here is what I have now:

[^a-zA-Z0-9-()/\s] 

How to do this to include Greek characters in addition to alphanumeric, '(', ')', '-' and '_'?

I am using C # by the way.

+7
c # regex unicode internationalization utf-8
source share
4 answers

In .NET languages ​​you can use \p{IsGreekandCoptic} to match Greek characters. Thus, the resulting regular expression

 [^a-zA-Z0-9-()/\s\p{IsGreekandCoptic}] 

\p{IsGreekandCoptic} matches:

These characters will be matched \ p {IsGreekandCoptic} http://img203.imageshack.us/img203/3760/greekcoptic.png

+4
source share

If you use a language that uses PCRE for regular expressions and UTF-8, /[\x{0374}-\x{03FF}]+/u must match Greek characters. Greek characters fall between U + 0374 and U + 03FF ( source ), and the u modifier tells PCRE to use unicode. As indicated below, /\p{Greek}+/u also works with PCRE.

If you use Javascript, it uses \uXXXX instead of \x{XXXX} : /[\u0374-\u03FF]+/ .

Also see this Unicode Regular Exions guide for more information.

+3
source share

For Java, from the javadoc template:

\ p {InGreek} Character in Greek block (simple block)

+1
source share

Being my first answer to SO, I can't lower Daniel's voice on javascript regex.

I know this is very late, but Daniel's answer is incorrect. This excludes the ancient characters below! This is important if you are working on a biblical app that explores ancient Greek words!

This is the correct regular expression for finding Greek and Copy in js:

 /[\u0370-\u03FF]+/gm 

http://unicode.org/charts/PDF/U0370.pdf

Excerpt from the chart:

0370 Ν° GREEK CAPITAL LETTER HETA β†’ 2C75 ⱡ latin capital letter gender h

0371 Ν± GREEK SMALL LETTER HETA β†’ 2C76 β±Ά latin small letter, half h

0372 Ν² GREEK CAPITAL LETTER ARCHAIC SAMPI

0373 Ν³ GREEK SMALL LETTER ARCHAIC SAMPI

EDIT: Craig indicates that Daniel's regular expression is true for OP. Although I cannot find where the OP indicates which Greek text he evaluates, I will concede that my answer is only valid for ancient texts.

While I am editing this, I also want to point out that no regular expression here matches Greek characters with such an accent that Perseus adds to his texts. Therefore, if you need to install http://www.perseus.tufts.edu/hopper/ or use any shared resources in the application, be careful with my regular expression.

0
source share

All Articles