How to capture HTML tag content?

Question

How to capture HTML tag content?

Hey, so what I want to do is snag the contents for the first paragraph. The $blog_post contains many paragraphs in the following format:

 <p>Paragraph 1</p><p>Paragraph 2</p><p>Paragraph 3</p>

The problem I am facing is that I am writing a regex to capture everything between the first <p> and the first closing tag </p> . However, it captures the first <p> tag and the tag> last , which leads to me capturing everything.

Here is my current code:

 if (preg_match("/[\\s]*<p>[\\s]*(?<firstparagraph>[\\s\\S]+)[\\s]*<\\/p>[\\s\\S]*/",$blog_post,$blog_paragraph)) echo "<p>" . $blog_paragraph["firstparagraph"] . "</p>"; else echo $blog_post;

+6

html php regex html-parsing

Andrew G. Johnson Sep 2 '08 at 1:41

source share

4 answers

If you use preg_match , use the “U” flag to make it undesirable.

 preg_match("/<p>(.*)<\/p>/U", $blog_post, &$matches);

$matches[1] will contain the first paragraph.

+6

Erik Öjebo Sep 2 '08 at 5:00

source share

It would probably be easier and faster to use strpos () to find the position of the first

<p>

and first

 </p>

then use substr () to extract the paragraph.

  $paragraph_start = strpos($blog_post, '<p>'); $paragraph_end = strpos($blog_post, '</p>', $paragraph_start); $paragraph = substr($blog_post, $paragraph_start + strlen('<p>'), $paragraph_end - $paragraph_start - strlen('<p>'));

Edit: Actually the regex in others' answers will be easier and faster ... your big complex regex in the question confused me ...

+1

Jeremy ruten Sep 2 '08 at 1:47

source share

Using regular expressions for html parsing will never be the right solution. You should use XPATH for this particular case:

 $string = <<<XML <a> <b> <c>texto</c> <c>cosas</c> </b> <d> <c>código</c> </d> </a> XML; $xml = new SimpleXMLElement($string); /* Busca <a><b><c> */ $resultado = $xml->xpath('//p[1]');

0

eLRuLL Dec 16 '17 at 10:53

source share

Kibbee · Accepted Answer · 2008-09-02T01:48:04+0000

Well, sysrqb will let you match anything in the first paragraph, assuming there is no other html in the paragraph. You may want something more like this.

 <p>.*?</p>

Accommodation ? after your * makes it inanimate, which means that it will only correspond to small texts, if necessary, before matching </p> .

How to capture HTML tag content?

More articles: