Removing URLs Using PHP

Question

Removing URLs Using PHP

I would like to remove only anchor tags and actual URLs.
For example, <a href="http://www.example.com">test www.example.com</a> will become test .

Thanks.

+4

php regex

Lior Apr 30 '11 at 20:31

source share

3 answers

I often use:

$string = preg_replace("/<a[^>]+>/i", "", $string);

And remember that strip_tags can remove all tags from a string, except for those listed in the whitelist. This is not what you want, but I also tell you about it for comprehensive information.

EDIT: I found the original source where I got this regex. I want to bring the author, for justice: http://bavotasan.com/tutorials/using-php-to-remove-an-html-tag-from-a-string/

+3

gd1 Apr 30 '11 at 20:33

source share

you should consider using the PHP DOM library for this task.

Regex is not the best tool for parsing HTML.

Here is an example:

 // Create a new DOM Document to hold our webpage structure $xml = new DOMDocument(); // Load the html contents into DOM $xml->loadHTML($html); $links = $xml->getElementsByTagName('a'); //Loop through each <a> tags and replace them by their text content for ($i = $links->length - 1; $i >= 0; $i--) { $linkNode = $links->item($i); $lnkText = $linkNode->textContent; $newTxtNode = $xml->createTextNode($lnkText); $linkNode->parentNode->replaceChild($newTxtNode, $linkNode); }

Note:

It is important to use a regression loop here, because when you call replaceChild , if the old node has a different name from the new node, it will be removed from the list after replacing it, and some of the links will not be replaced.
This code does not remove the urls from the text inside the node, you can use preg_replace from nico on $ lnkText to the createTextNode line. It is always better to isolate parts from html using the DOM, and then use regular expressions for these text parts.

+2

Yann milin May 02 '11 at 9:31

source share

nico · Accepted Answer · 2011-04-30T20:55:04+0000

To complement gd1's answer, it will get all the urls:

 // http(s):// $txt = preg_replace('|https?://www\.[az\.0-9]+|i', '', $txt); // only www. $txt = preg_replace('|www\.[az\.0-9]+|i', '', $txt);

Removing URLs Using PHP

Note:

More articles: