How to capture HTML tag content?

Hey, so what I want to do is snag the contents for the first paragraph. The $blog_post contains many paragraphs in the following format:

 <p>Paragraph 1</p><p>Paragraph 2</p><p>Paragraph 3</p> 

The problem I am facing is that I am writing a regex to capture everything between the first <p> and the first closing tag </p> . However, it captures the first <p> tag and the tag> last , which leads to me capturing everything.

Here is my current code:

 if (preg_match("/[\\s]*<p>[\\s]*(?<firstparagraph>[\\s\\S]+)[\\s]*<\\/p>[\\s\\S]*/",$blog_post,$blog_paragraph)) echo "<p>" . $blog_paragraph["firstparagraph"] . "</p>"; else echo $blog_post; 
+6
html php regex html-parsing
source share
4 answers

Well, sysrqb will let you match anything in the first paragraph, assuming there is no other html in the paragraph. You may want something more like this.

 <p>.*?</p> 

Accommodation ? after your * makes it inanimate, which means that it will only correspond to small texts, if necessary, before matching </p> .

+18
source share

If you use preg_match , use the “U” flag to make it undesirable.

 preg_match("/<p>(.*)<\/p>/U", $blog_post, &$matches); 

$matches[1] will contain the first paragraph.

+6
source share

It would probably be easier and faster to use strpos () to find the position of the first

  <p> 

and first

 </p> 

then use substr () to extract the paragraph.

  $paragraph_start = strpos($blog_post, '<p>'); $paragraph_end = strpos($blog_post, '</p>', $paragraph_start); $paragraph = substr($blog_post, $paragraph_start + strlen('<p>'), $paragraph_end - $paragraph_start - strlen('<p>')); 

Edit: Actually the regex in others' answers will be easier and faster ... your big complex regex in the question confused me ...

+1
source share

Using regular expressions for html parsing will never be the right solution. You should use XPATH for this particular case:

 $string = <<<XML <a> <b> <c>texto</c> <c>cosas</c> </b> <d> <c>código</c> </d> </a> XML; $xml = new SimpleXMLElement($string); /* Busca <a><b><c> */ $resultado = $xml->xpath('//p[1]'); 
0
source share

All Articles