I am trying to get the line between the "img" tag and the closing of the "a" tag (hello, I'm because of the img tag) from the line below.
<a href="products.html><img src="image.jpg" alt="alt value">hello i am from after img tag</a>
And then check if they match. At the same time, I want to find out the number of this line. I tried the following code which gives me line numbers and webpage line.
$dom = new domDocument; $dom->loadHTMLFile('http://www.google.com'); $dom->preserveWhiteSpace = true; $dom->formatOutput = true; $new = htmlspecialchars($dom->saveHTML(), ENT_QUOTES); $lines = preg_split('/\r\n|\r|\n/', $new); foreach ($lines as $lineNumber => $line) { echo $lineNumber . \r\r. $line; }
The following code gives me the image source and the alt tag. I canβt get the line number or text between the "img" tag and closing the "a" tag (hello, I am because of the img tag).
$alts = array(); $tags = $dom->getElementsByTagName('img'); foreach($tags as $tag) { $alts[$tag->attributes->getNamedItem('src')->nodeValue] = $tag->attributes->getNamedItem('alt')->nodeValue; } foreach($alts as $key => $alt) { echo "{$key} => {$alt}<br/>"; }
I know that regexp is not for html parsing, but I tried this to see if it works for the next line. $ alt = 'hello, I'm from after the img tag' Regular expression that I have, // anything inside alt
preg_match_all('|\s*alt[^>]*=[\'"](.*?)[\'"]|i', $alt, $altTag);
// something between the img tag and the closing a tag.
preg_match_all("#<\s*a[^>]*><\s*img[^>]*>(.*?)<\s*/\s*a>#s", $alt, $foo);
it works fine, but when I tried to use the same regExp in the $ line variable (from the code above), it gives me nothing. Can someone help me please. I really need to do this job. thanks