What is the regular expression for a spanish word?

Regular expression languages ​​use \ B to include A..Z, a..z, 0..9 and _, and \ b is defined as the word boundary.

How can I write a regular expression that matches all valid Spanish words, including characters such as: á, í, ó, é, ñ, etc.?

I am using .NET.

+6
source share
3 answers

Use Spanish and make your language with regular expression.

+6
source

Your regex system must have something equivalent to Python re.L (aka re.LOCALE ) in order to make a language-dependent regular expression, so that something is a word-character and something that doesn't change from the locale, like "word boundaries" ", etc. Do you ask for a way to compensate for some given regular expression system that does not support the locale, trying to make the problem anyway ...?

+1
source

It depends a lot on the language (and the regex engine) you use.

In Perl, \w matches all characters of a word, regardless of language or alphabet, and something like /\b(\w+)\b/ will (possibly) match Spanish words, as well as English words or Russian words.

In languages ​​using PCRE, \w (and therefore possibly \b ) do NOT match Unicode characters. You will probably need to create your own set. I suggest something like [\wáéíóúñ] (matches all word characters as well as highlighted characters), and the PCRE library needs to be pre-built with Unicode support before it even works.

If you use something else, good luck. Some regex engines do not even support Unicode.

+1
source

All Articles