Php oath word filter

I am working on a WordPress plugin that replaces bad words from comments with random new ones from the list.

Now I have 2 arrays: one contains bad words and the other contains good words.

$bad = array("bad", "words", "here"); $good = array("good", "words", "here"); 

Since I'm new, I got stuck at some point.

To replace bad words, I used $newstring = str_replace($bad, $good, $string); .

My first problem is that I want to turn off case sensitivity, so I won’t put words like "bad", "Bad", "BAD", "bAd", "BAd", etc , but I need a new word, to preserve the format of the original word, for example, if I write “Bad”, it will be replaced by the words “Words”, but if I type “bad”, it will be replaced by the words “words”, etc.

My first work was to use str_ireplace , but it forgets if the original word has a capital letter.

The second problem is that I do not know how to deal with users who are of the type: "bad", "words", etc. I need an idea.

To do this, select a random word, I think I can use $new = $good[rand(0, count($good)-1)]; , then $newstring = str_replace($bad, $new, $string); . If you have an idea, I'm here to listen.

General view of my script:

 function noswear($string) { if ($string) { $bad = array("bad", "words"); $good = array("good", "words"); $newstring = str_replace($bad, $good, $string); return $newstring; } echo noswear("I see bad words coming!"); 

Thank you in advance for your help!

+8
php preg-replace preg-match wordpress
source share
2 answers

Forerunners

There is (as was repeatedly noted in the comments), to make out whole rooms for you - and / or your code - to get into the implementation of such a function, to name a few:

  • People will add characters to trick the filter.
  • People will become creative (e.g. innuendo)
  • People will use passive aggression and sarcasm.
  • People will use sentences / phrases not only for words

You better implement a system of measurements / flags, where people can mark offensive comments, which can then be edited / deleted by mods, users, etc.

In this understanding, let's continue ...

Decision

Given that you:

  • List of banned words $bad_words
  • Enter a list of replacement words $good_words
  • Want to replace bad words no matter the case
  • Want to replace bad words with random good words
  • You have a correctly escaped list of incorrect words: see http://php.net/preg_quote

You can easily use the PHP function preg_replace_callback :

 $input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?'; $bad_words = array('could', 'would', 'should'); $good_words = array('might', 'will'); function replace_words($matches){ global $good_words; return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; } echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string); 

So what preg_replace_callback does, it compiles a regex pattern consisting of all the bad words. Matches will be in the format:

 /(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i 

The i modifier makes the case insensitive, so both bad and bad match.

The replace_words function replace_words takes the matched word and its boundaries (either blank or white space), and replaces it with borders and a random good word.

 global $good_words; <-- Makes the $good_words variable accessible from within the function $matches[1] <-- The word boundary before the matched word $matches[3] <-- The word boundary after the matched word $good_words[rand(0, count($good_words)-1] <-- Selects a random good word from $good_words 

Anonymous function

You can rewrite the above as one insert using the anonymous function in preg_replace_callback

 echo preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string ); 

Function wrapper

If you intend to use it several times, you can also write it as a standalone function, although in this case you most likely want to pass good / bad words to the function when you call it (or hard code them all the time), but it depends how you output them ...

 function clean_string($input_string, $bad_words, $good_words){ return preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string ); } echo clean_string($input_string, $bad_words, $good_words); 

Exit

Performing the above functions in sequence with the input list and words shown in the first example:

 This will be interesting but might it be? Perhaps this 'will' work; or couldn't it? This might be interesting but might it be? Perhaps this 'might' work; or couldn't it? This might be interesting but will it be? Perhaps this 'will' work; or couldn't it? 

Of course, the replacement words are chosen randomly, so if I refresh the page, I would get something else ... But this shows what it does / does not replace.

NB

Hiding $bad_words

 foreach($bad_words as $key=>$word){ $bad_words[$key] = preg_quote($word); } 

Word Boundaries \b

In this code, I used \b , \s and ^ or $ as word boundaries, there is a good reason for this. While white space , start of string and end of string are considered word boundaries \b will not coincide in all cases, for example:

 \b\$h1t\b <---Will not match 

This is because \b matches characters without a word (ie [^a-zA-Z0-9] ), and characters like $ not considered characters of a word.

miscellanea

Depending on the size of the word list, there are several potential hiccups. From the point of view of the design of the system as a whole, the poor form has a huge number of regular expressions for several reasons:

  • Hard to maintain
    • It is hard to read / understand what he is doing
    • Hard to find errors
  • It can be intense in memory if the list is too long.

Given that the regex pattern is compiled by PHP , the first reason is denied. The second should also be negative; if you are a large list of words with a dozen permutations of each bad word, then I suggest you stop and rethink your approach (read: use the marking / moderation system).

To clarify, I do not see a problem with a small list of words to filter out specific curses, as this serves the purpose of: stopping users from flash from each other; The problem occurs when you try to filter out too much , including permutations. Adhere to filtering ordinary abusive words, and if this does not work, then - for the last time - implement a marking / deceleration system.

+6
source share

I approached this method and it works great. Return true if there is a record of bad words in the record.

Example:

 function badWordsFilter($inputWord) { $badWords = Array("bad","words","here"); for($i=0;$i<count($badWords);$i++) { if($badWords[$i] == strtolower($inputWord)) return true; } return false; } 

Using:

 if (badWordsFilter("bad")) { echo "Bad word was found"; } else { echo "No bad words detected"; } 

Since the word "bad" is blacklisted, it will be an echo.

Online example 1

EDIT 1:

As suggested by removal, you can also perform a simple in_array check:

 function badWordsFilter($inputWord) { $badWords = Array("bad","words","here"); if(in_array(strtolower($inputWord), $badWords) ) { return true; } return false; } 

Online Example 2

EDIT 2:

As I promised, I came up with a slightly different idea of ​​replacing bad words with good words, as you mentioned in your question. I hope this helps you a little, but it is the best that I can offer at the moment, as I am completely not sure what you are trying to do.

Example:

1. Let me combine the array with the good and bad words into one

 $wordsTransform = array( 'shit' => 'ship' ); 

2. Your imaginary user input

 $string = "Rolling In The Deep by Adel\n \n There a fire starting in my heart\n Reaching a fever pitch, and it bringing me out the dark\n Finally I can see you crystal clear\n Go ahead and sell me out and I'll lay your shit bare"; 

3. Replacing bad words with good words

 $string = strtr($string, $wordsTransform); 

4. Getting the desired result.

Deep swapping

Fire begins in my heart Reaching the height of a fever, and it brings me dark
Finally, I see that you are crystal clear
Go ahead and sell me, and I will leave my ship naked.

Example on the Internet 3

EDIT 3:

To follow the correct comment from Wrikken, I completely forgot that strtr case sensitive and it is better to follow the word boundary. I gave the following example from PHP: strtr - Manual and slightly modified it.

The same as in my second edit, but not case sensitive, checks the word boundaries and places a backslash in front of each character that is part of the regular expression syntax:

1. Method:

 // // Written by Patrick Rauchfuss class String { public static function stritr(&$string, $from, $to = NULL) { if(is_string($from)) $string = preg_replace("/\b{$from}\b/i", $to, $string); else if(is_array($from)) { foreach ($from as $key => $val) self::stritr($string, $key, $val); } return preg_quote($string); // return and add a backslash to special characters } } 

2. an array with good and bad words

 $wordsTransform = array( 'shit' => 'ship' ); 

3. Replacement

 String::stritr($string, $wordsTransform); 

Online Example 4

+4
source share

All Articles