Regex (preg_match in php): last groups in output array are not working correctly

With this template:

(how is\s)?(the\s)?(weather)\s?((on)\s)?(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week)?(\s(in)\s(.*)\s?(on)?\s?(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week)?)?

That's what I'm trying to capture

Login :how is the weather on tuesday in vienna

conclusion :

array(10
0   =>  how is the weather on tuesday in vienna
1   =>  how is 
2   =>  the 
3   =>  weather
4   =>  on 
5   =>  on
6   =>  tuesday
7   =>   in vienna
8   =>  in
9   =>  vienna
)

Here I can extract the day and location from array[6]andarray[9]

Login :how is the weather in vienna on tuesday

conclusion :

array(10
0   =>  how is the weather in vienna on tuesday
1   =>  how is 
2   =>  the 
3   =>  weather
4   =>  
5   =>  
6   =>  
7   =>  in vienna on tuesday
8   =>  in
9   =>  vienna on tuesday
)

But here the location and day are recorded as a whole in array[9]. I want it to fix the day and place in different elements. Is there something wrong with the grouping in the regex pattern?

+4
source share
2 answers

Description

I recommend using optional lookaheads to search and search for a location or timeframe, if they exist.

^(?=(?:.*?on\s(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week))?)(?=(?:.*?in\s([a-z]+))?)

Regular expression visualization

This regex will do the following:

  • 1 ,
  • 2 ,

Live Demo

https://regex101.com/r/rN9hG2/1

weather on sunday
weather on sunday in vienna
weather in vienna
weather in vienna on sunday

[0][1] = sunday
[0][2] = 

[1][1] = sunday
[1][2] = vienna

[2][1] = 
[2][2] = vienna

[3][1] = sunday
[3][2] = vienna

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
      on                       'on'
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      (                        group and capture to \1:
----------------------------------------------------------------------
        today                    'today'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tomorrow                 'tomorrow'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        sunday                   'sunday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        monday                   'monday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tuesday                  'tuesday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        wednesday                'wednesday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        thursday                 'thursday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        friday                   'friday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        saturday                 'saturday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        sunday                   'sunday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        this week                'this week'
----------------------------------------------------------------------
      )                        end of \1
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
      in                       'in'
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      (                        group and capture to \2:
----------------------------------------------------------------------
        [a-z]+                   any character of: 'a' to 'z' (1 or
                                 more times (matching the most amount
                                 possible))
----------------------------------------------------------------------
      )                        end of \2
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
+1

, , :

(\w+)\s+(\w+)\s+(\w+)(?:\s+(\w+)\s(\w+))?

Regular expression visualization

- Regex

enter image description here

MATCH 1
1.  [3-10]  `weather`
2.  [11-13] `on`
3.  [14-20] `sunday`
MATCH 2
1.  [25-32] `weather`
2.  [33-35] `on`
3.  [36-42] `sunday`
4.  [43-45] `in`
5.  [46-52] `vienna`
MATCH 3
1.  [57-64] `weather`
2.  [65-67] `in`
3.  [68-74] `vienna`
MATCH 4
1.  [79-86] `weather`
2.  [87-89] `in`
3.  [90-96] `vienna`
4.  [97-99] `on`
5.  [100-106]   `sunday`

, , , , , :

\w+\s+\w+\s+(\w+)(?:\s+\w+\s(\w+))?

Regular expression visualization

- Regex

MATCH 1
1.  [14-20] `sunday`
MATCH 2
1.  [36-42] `sunday`
2.  [46-52] `vienna`
MATCH 3
1.  [68-74] `vienna`
MATCH 4
1.  [90-96] `vienna`
2.  [100-106]   `sunday`
+1

All Articles