Word capture with optional prefiex

I need to extend an existing regular expression to catch some optional prefix as well. My current regex is working fine:

(?:\b)(?:mon|tue|wed|thu|fri|sat|sun)(?:\b) 

and matches any of these words, separated by word boundaries. For example, given the string "mon-sun.sat" it will match mon , sun and sat individually.

Now, say, the words above can optionally be displayed with a prefix like "each" "only" "any" , for example, "mon. any-tue or only-wed. sat. each weekend"

I want to expand my regex to match and capture (in the above example), the terms mon any tue only wed sat , but obviously not each , because there is no list member prefix. In practice, a capture template: optional prefix followed by day of the week .

I tried to expand my regular expression in several ways, but without success. I guess I messed up the word boundaries.

In other words: There are two sets of words: P={each,only,any} and W={mon,tue,wed,thu,fri,sat,sun} . I need to map any w in W element optionally to the p in P element prefix. Separators can be any \ b.

EDIT: my current attempt (:?\b) ((any|only|each)?(:?\b)) (:?mon|tue|wed|thu|fri|sat|sun) (:?\b) but will only match mon tue wed sat .

+5
source share
1 answer

you can use

 \b(?:(any|only|each)\W+)?(mon|tue|wed|thu|fri|sat|sun)\b 

Watch the regex demo

More details

  • \b - leading word boundary
  • (?:(any|only|each)\W+)? - an optional non-capture group that matches 1 or 0 occurrences:
    • (any|only|each) - the whole word (the upper word boundary has already been indicated above with \b , and the boundary of the final word is guaranteed with \W+ ) any , only each`
    • \W+ - 1 or more characters without words.
  • (mon|tue|wed|thu|fri|sat|sun)\b is the whole word (due to the initial \b or \W+ and a \b after the capture group): either mon , tue , wed , thu , fri , sat or sun .

Please note that the group (?:...)? non-captureuring is used to wrap an optional subpattern because it does not create a capture buffer compared to a capture group. ? is a quantifier that makes it match 1 or 0 by the occurrence of a sequence of subpatterns within a group. \W is a non-primary char abbreviated character class that consumes any non-word char (therefore, any punctuation and characters and even spaces will match).

+3
source

All Articles