How to write regular expressions for German character classes such as letters, vowels and consonants?

For example, I installed them:

L = /[az,AZ,ßäüöÄÖÜ]/ V = /[äöüÄÖÜaeiouAEIOU]/ K = /[ßb-zBZ&&[^#{V}]]/ 

So, /(#{K}#{V}{2})/ matches "ᄚ" in "azAZᄚ" .

Are there any better ways to deal with them?

Can I put these constants in a module in a file somewhere in my Ruby installation folder, so I can include / require them in any new script that I write on my computer? (I'm a newbie and I know that I am confusing this terminology, please correct me.)

Also, I can only get the metacharacters \L , \V and \K (or whatever is not already set in Ruby) to stand behind them in regexes, so I don’t need to do this interpolation line thing all the time?

+6
source share
2 answers

You start very well, but you need to look at the code for the Regexp class that Ruby has installed. There are tricks for writing patterns that build themselves using String interpolation. You paint bricks and let Ruby build walls and houses using regular String tricks, and then turn the resulting strings into real Regexp instances for use in your code.

For instance:

 LOWER_CASE_CHARS = 'az' UPPER_CASE_CHARS = 'AZ' CHARS = LOWER_CASE_CHARS + UPPER_CASE_CHARS DIGITS = '0-9' CHARS_REGEX = /[#{ CHARS }]/ DIGITS_REGEX = /[#{ DIGITS }]/ WORDS = "#{ CHARS }#{ DIGITS }_" WORDS_REGEX = /[#{ WORDS }]/ 

You continue to build from small atomic characters and character classes, and soon you will have big regular expressions. Try pasting these one by one into the IRB and you will quickly get it.

+1
source

A slight improvement on what you are doing now will use unicode regex support for categories or scripts .

If you mean L like any letter, use \p{L} . Or use \p{Latin} if you want this to mean any letter in the Latin script (all German letters).

I do not think that there are built-in modules for vowels and consonants.

See \p{L} matches your example .

0
source

All Articles