How to use preg_split () in php?

Can someone explain to me how to use the preg_split () function? I did not understand the template parameter like this "/[\s,]+/" .

eg:

I have this question: is is. , and I want the results to be:

 array ( 0 => 'is', 1 => 'is', ) 

so it will ignore space and full stop, how can I do this?

+8
php preg-split
source share
4 answers

preg means P cre REG exp ", which is redundant because" PCRE "means" Perl Compatible Regexp ".

Regexps is a novice nightmare. I still do not understand them, and I have been working with them for many years.

Basically, the example you have is broken down into:

 "/[\s,]+/" / = start or end of pattern string [ ... ] = grouping of characters + = one or more of the preceeding character or group \s = Any whitespace character (space, tab). , = the literal comma character 

So, you have a search pattern that "is divided into any part of the string that is at least one space character and / or one or more commas."

Other common characters:

 . = any single character * = any number of the preceeding character or group ^ (at start of pattern) = The start of the string $ (at end of pattern) = The end of the string ^ (inside [...]) = "NOT" the following character 

For PHP, there is good information in the official documentation .

+27
source share

This should work:

 $words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY); echo '<pre>'; print_r($words); echo '</pre>'; 

The conclusion will be:

 Array ( [0] => is [1] => is ) 

Before I explain the regex, just an explanation at PREG_SPLIT_NO_EMPTY . This means only returning preg_split results if the results are not empty. This ensures that the data returned in the $words array does indeed contain data in it, and not just empty values ​​that can occur when working with regular expression patterns and mixed data sources.

And the explanation of this regex can be broken down using this tool :

 NODE EXPLANATION -------------------------------------------------------------------------------- (?<= look behind to see if there is: -------------------------------------------------------------------------------- \w word characters (az, AZ, 0-9, _) -------------------------------------------------------------------------------- ) end of look-behind -------------------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char -------------------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- [!?.]* any character of: '!', '?', '.' (0 or more times (matching the most amount possible)) 

A more pleasant explanation can be found by entering the full regex /(?<=\w)\b\s*[!?.]*/ in this other tool :

  • (?<=\w) Positive Lookbehind - Claim that the expression below can be matched
  • \w matches any character of the word [a-zA-Z0-9_]
  • \b approve the position on the word boundary (^\w|\w$|\W\w|\w\W)
  • \s* matches any space character [\r\n\t\f ]
  • Qualifier: Between zero and unlimited time, as many times as possible, returning if necessary [greedy]
  • !?. the only character in the list !?. literally

The last explanation of regular expression can be summarized by a person also known as me, as follows:

Matching and separating is any character of a word that precedes the word boundary, which may have several spaces and punctuation marks !?. .

+6
source share

The documentation reads:

The preg_split () function works exactly like split (), except that regular expressions are accepted as input to the template.

So the following code ...

 <?php $ip = "123 ,456 ,789 ,000"; $iparr = preg_split ("/[\s,]+/", $ip); print "$iparr[0] <br />"; print "$iparr[1] <br />" ; print "$iparr[2] <br />" ; print "$iparr[3] <br />" ; ?> 

This will lead to the following result.

 123 456 789 000 

So, if there is a topic: is is and you want: an array (0 => 'is', 1 => 'is',)

you need to change your regex to "/[\s]+/"

If you do not have is ,is , you need to have the regular expression "/[\s,]+/"

+1
source share

PHP str_word_count may be the best choice here.

str_word_count($string, 2) displays an array of all the words in the string, including duplicates.

+1
source share

All Articles