Match the line between the img tag and the alt tag and line numbers

I am trying to get the line between the "img" tag and the closing of the "a" tag (hello, I'm because of the img tag) from the line below.

<a href="products.html><img src="image.jpg" alt="alt value">hello i am from after img tag</a> 

And then check if they match. At the same time, I want to find out the number of this line. I tried the following code which gives me line numbers and webpage line.

  $dom = new domDocument; $dom->loadHTMLFile('http://www.google.com'); $dom->preserveWhiteSpace = true; $dom->formatOutput = true; $new = htmlspecialchars($dom->saveHTML(), ENT_QUOTES); $lines = preg_split('/\r\n|\r|\n/', $new); foreach ($lines as $lineNumber => $line) { echo $lineNumber . \r\r. $line; } 

The following code gives me the image source and the alt tag. I can’t get the line number or text between the "img" tag and closing the "a" tag (hello, I am because of the img tag).

 $alts = array(); $tags = $dom->getElementsByTagName('img'); foreach($tags as $tag) { $alts[$tag->attributes->getNamedItem('src')->nodeValue] = $tag->attributes->getNamedItem('alt')->nodeValue; } foreach($alts as $key => $alt) { echo "{$key} => {$alt}<br/>"; } 

I know that regexp is not for html parsing, but I tried this to see if it works for the next line. $ alt = 'hello, I'm from after the img tag' Regular expression that I have, // anything inside alt

 preg_match_all('|\s*alt[^>]*=[\'"](.*?)[\'"]|i', $alt, $altTag); 

// something between the img tag and the closing a tag.

 preg_match_all("#<\s*a[^>]*><\s*img[^>]*>(.*?)<\s*/\s*a>#s", $alt, $foo); 

it works fine, but when I tried to use the same regExp in the $ line variable (from the code above), it gives me nothing. Can someone help me please. I really need to do this job. thanks

+4
source share
2 answers

Try

 $str = '<a href="products.html><img src="image.jpg" alt="alt value">hello i am from after img tag</a>'; preg_match_all('#<a[^>]+>(.*)</a>#isU', $str, $match); $result = array_map('strip_tags', $match[1]); print_r($result); 
+1
source

Using Regexp in your case is not a good idea, but if you really want to use it, you need to change your foreach loop. here is the code.

 foreach ($lines as $lineNumber => $l){ $line= html_entity_decode($l); } 

Now you can apply regExp to find what you want.

+2
source

All Articles