This should work:
$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY); echo '<pre>'; print_r($words); echo '</pre>';
The conclusion will be:
Array ( [0] => is [1] => is )
Before I explain the regex, just an explanation at PREG_SPLIT_NO_EMPTY . This means only returning preg_split results if the results are not empty. This ensures that the data returned in the $words array does indeed contain data in it, and not just empty values ββthat can occur when working with regular expression patterns and mixed data sources.
And the explanation of this regex can be broken down using this tool :
NODE EXPLANATION -------------------------------------------------------------------------------- (?<= look behind to see if there is: -------------------------------------------------------------------------------- \w word characters (az, AZ, 0-9, _) -------------------------------------------------------------------------------- ) end of look-behind -------------------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char -------------------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- [!?.]* any character of: '!', '?', '.' (0 or more times (matching the most amount possible))
A more pleasant explanation can be found by entering the full regex /(?<=\w)\b\s*[!?.]*/ in this other tool :
(?<=\w) Positive Lookbehind - Claim that the expression below can be matched\w matches any character of the word [a-zA-Z0-9_]\b approve the position on the word boundary (^\w|\w$|\W\w|\w\W)\s* matches any space character [\r\n\t\f ]- Qualifier: Between zero and unlimited time, as many times as possible, returning if necessary [greedy]
!?. the only character in the list !?. literally
The last explanation of regular expression can be summarized by a person also known as me, as follows:
Matching and separating is any character of a word that precedes the word boundary, which may have several spaces and punctuation marks !?. .
Jakegould
source share