It does not seem to find a reliable regular expression to remove spam

I cannot find suitable simple regular expressions to remove spam links. One I try to work, but only if www.example.com does not have a period and another offer next to it. I have a good book on regular expressions, but I just don’t have time to learn all this.

Here is the regex that I use. I honestly am not sure that I am even doing it right.

$a = $_POST['msge']; $b = preg_replace('^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$^', '[LINK REMOVED]', $a); print $b; 

So, I was wondering if the code looks right, does anyone have a better regex that I could use?

+4
source share
2 answers

Tim answered my question. He wrote:

Currently, your regex finds links only if they are at the end of the file (due to $ anchor). Also, you better not use ^ as a regex separator, because it is an important metacharacter in a regex. It is better to use ~ or% if you do not want to use the standard /

0
source

it is not possible to reliably detect all links . Especially if you want to find links without a protocol ( bit.ly/foo , etc.).

You can find more (but not all) links using

 $result = preg_replace( '/\b (?: (?:https?|ftp|file):\/\/ # protocol (optional) |www\.|ftp\.|bit\. # add more typical "link starters" here ) [-A-Z0-9+&@#\/%=~_|$?!:,.]* [A-Z0-9+&@#\/%=~_|$] /ix', '[LINK REMOVED]', $subject); 
0
source

All Articles