How to use preg_split () in php?

Question

How to use preg_split () in php?

Can someone explain to me how to use the preg_split () function? I did not understand the template parameter like this "/[\s,]+/" .

eg:

I have this question: is is. , and I want the results to be:

 array ( 0 => 'is', 1 => 'is', )

so it will ignore space and full stop, how can I do this?

+8

php preg-split

MD.MD Jun 12 '14 at 16:42

source share

4 answers

This should work:

 $words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY); echo '<pre>'; print_r($words); echo '</pre>';

The conclusion will be:

 Array ( [0] => is [1] => is )

Before I explain the regex, just an explanation at PREG_SPLIT_NO_EMPTY . This means only returning preg_split results if the results are not empty. This ensures that the data returned in the $words array does indeed contain data in it, and not just empty values that can occur when working with regular expression patterns and mixed data sources.

And the explanation of this regex can be broken down using this tool :

 NODE EXPLANATION -------------------------------------------------------------------------------- (?<= look behind to see if there is: -------------------------------------------------------------------------------- \w word characters (az, AZ, 0-9, _) -------------------------------------------------------------------------------- ) end of look-behind -------------------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char -------------------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- [!?.]* any character of: '!', '?', '.' (0 or more times (matching the most amount possible))

A more pleasant explanation can be found by entering the full regex /(?<=\w)\b\s*[!?.]*/ in this other tool :

(?<=\w) Positive Lookbehind - Claim that the expression below can be matched
\w matches any character of the word [a-zA-Z0-9_]
\b approve the position on the word boundary (^\w|\w$|\W\w|\w\W)
\s* matches any space character [\r\n\t\f ]
Qualifier: Between zero and unlimited time, as many times as possible, returning if necessary [greedy]
!?. the only character in the list !?. literally

The last explanation of regular expression can be summarized by a person also known as me, as follows:

Matching and separating is any character of a word that precedes the word boundary, which may have several spaces and punctuation marks !?. .

+6

Jakegould Jun 12 '14 at 16:45

source share

The documentation reads:

The preg_split () function works exactly like split (), except that regular expressions are accepted as input to the template.

So the following code ...

 <?php $ip = "123 ,456 ,789 ,000"; $iparr = preg_split ("/[\s,]+/", $ip); print "$iparr[0] <br />"; print "$iparr[1] <br />" ; print "$iparr[2] <br />" ; print "$iparr[3] <br />" ; ?>

This will lead to the following result.

 123 456 789 000

So, if there is a topic: is is and you want: an array (0 => 'is', 1 => 'is',)

you need to change your regex to "/[\s]+/"

If you do not have is ,is , you need to have the regular expression "/[\s,]+/"

+1

Federico piazza Jun 12 '14 at 16:46

source share

PHP str_word_count may be the best choice here.

str_word_count($string, 2) displays an array of all the words in the string, including duplicates.

+1

ceejayoz Jun 12 '14 at 16:54

source share

Majenko · Accepted Answer · 2014-06-12T16:50:09+0000

preg means P cre REG exp ", which is redundant because" PCRE "means" Perl Compatible Regexp ".

Regexps is a novice nightmare. I still do not understand them, and I have been working with them for many years.

Basically, the example you have is broken down into:

 "/[\s,]+/" / = start or end of pattern string [ ... ] = grouping of characters + = one or more of the preceeding character or group \s = Any whitespace character (space, tab). , = the literal comma character

So, you have a search pattern that "is divided into any part of the string that is at least one space character and / or one or more commas."

Other common characters:

 . = any single character * = any number of the preceeding character or group ^ (at start of pattern) = The start of the string $ (at end of pattern) = The end of the string ^ (inside [...]) = "NOT" the following character

For PHP, there is good information in the official documentation .

How to use preg_split () in php?

More articles: