Regex in Python to find words that follow a pattern: vowel, consonant, vowel, consonant

Trying to learn Regex in Python to find words that have consecutive consonants, consonants, or consonants, vowels. How do I do this in regex? If this is not possible in Regex, is there an efficient way to do this in Python?

+7
source share
3 answers

I believe you should use a regex like this:

r"([aeiou][bcdfghjklmnpqrstvwxz])+" 

for matching vowels followed by a consonant and:

 r"([bcdfghjklmnpqrstvwxz][aeiou])+" 

to match a consonant followed by a vowel. For reference, + means that it will match the largest repetition of this pattern that it can find. For example, applying the first pattern to "ababab" will return the entire string, not individual occurrences of "ab".

If you want to combine one or more vowels followed by one or more consonants, this might look like this:

 r"([aeiou]+[bcdfghjklmnpqrstvwxz]+)+" 

Hope this helps.

+11
source
 ^(([aeiou][^aeiou])+|([^aeiou][aeiou])+)$ 

 >>> import re >>> consec_re = re.compile(r'^(([aeiou][^aeiou])+|([^aeiou][aeiou])+)$') >>> consec_re.match('bale') <_sre.SRE_Match object at 0x01DBD1D0> >>> consec_re.match('bail') >>> 
+2
source

If you match consonant digraphs into single consonants, the longest such word is anatomopathological like line 10 * VC.

If you plotted the y card correctly, you get complete strings such as acetylacetonates like 8 * VC and hypocotyledonary like 8 * CV.

If you don't need a string to be integral, you get a 9 * CV pattern in chemicomineralogical and a 9 * VC pattern in overimaginativeness.

There are many 10 * words if alternating consecutive consonants or vowels is allowed to alternate, as in (C+V+)+ . These include laparocolpohysterotomy and ureterocystanastomosis.

The main trick is to first map all consonants to C and all vowels to V, then match VC or CV. For Y, you must do lookaheads and / or lookbehinds to determine if it matches with C or V at this position.

I could show you the patterns that I used, but you probably won't be happy with me. :) For example:

  (?<= \p{IsVowel} ) [yY] (?= \p{IsVowel} ) # counts as a C (?<= \p{IsConsonant} ) [yY] # counts as a V [yY] (?= \p{IsVowel} ) # counts as a C 

The main trick then becomes one of the search for matching matches of alternating VC or CV through

  (?= ( (?: \p{IsVowel} \p{IsConsonant} ) )+ ) ) 

and

  (?= ( (?: \p{IsConsonant} \p{IsVowel} ) )+ ) ) 

Then you count it all up and see which ones are the longest.

However, since Python support does not support (by default / directly) the support properties in regular expressions as I used them only for my own program, it makes it even more important to process the string first only in Cs and Vs. Otherwise, your templates look really ugly.

+2
source

All Articles