Php oath word filter

Question

Php oath word filter

I am working on a WordPress plugin that replaces bad words from comments with random new ones from the list.

Now I have 2 arrays: one contains bad words and the other contains good words.

$bad = array("bad", "words", "here"); $good = array("good", "words", "here");

Since I'm new, I got stuck at some point.

To replace bad words, I used $newstring = str_replace($bad, $good, $string); .

My first problem is that I want to turn off case sensitivity, so I won’t put words like "bad", "Bad", "BAD", "bAd", "BAd", etc , but I need a new word, to preserve the format of the original word, for example, if I write “Bad”, it will be replaced by the words “Words”, but if I type “bad”, it will be replaced by the words “words”, etc.

My first work was to use str_ireplace , but it forgets if the original word has a capital letter.

The second problem is that I do not know how to deal with users who are of the type: "bad", "words", etc. I need an idea.

To do this, select a random word, I think I can use $new = $good[rand(0, count($good)-1)]; , then $newstring = str_replace($bad, $new, $string); . If you have an idea, I'm here to listen.

General view of my script:

 function noswear($string) { if ($string) { $bad = array("bad", "words"); $good = array("good", "words"); $newstring = str_replace($bad, $good, $string); return $newstring; } echo noswear("I see bad words coming!");

Thank you in advance for your help!

+8

php preg-replace preg-match wordpress

Rawrrr1337 Oct 14 '13 at 11:03

source share

2 answers

Steven · Answer 1 · 2013-10-14T21:14:58+0000

Forerunners

There is (as was repeatedly noted in the comments), to make out whole rooms for you - and / or your code - to get into the implementation of such a function, to name a few:

People will add characters to trick the filter.
People will become creative (e.g. innuendo)
People will use passive aggression and sarcasm.
People will use sentences / phrases not only for words

You better implement a system of measurements / flags, where people can mark offensive comments, which can then be edited / deleted by mods, users, etc.

In this understanding, let's continue ...

Decision

Given that you:

List of banned words $bad_words
Enter a list of replacement words $good_words
Want to replace bad words no matter the case
Want to replace bad words with random good words
You have a correctly escaped list of incorrect words: see http://php.net/preg_quote

You can easily use the PHP function preg_replace_callback :

 $input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?'; $bad_words = array('could', 'would', 'should'); $good_words = array('might', 'will'); function replace_words($matches){ global $good_words; return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; } echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string);

So what preg_replace_callback does, it compiles a regex pattern consisting of all the bad words. Matches will be in the format:

 /(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i

The i modifier makes the case insensitive, so both bad and bad match.

The replace_words function replace_words takes the matched word and its boundaries (either blank or white space), and replaces it with borders and a random good word.

 global $good_words; <-- Makes the $good_words variable accessible from within the function $matches[1] <-- The word boundary before the matched word $matches[3] <-- The word boundary after the matched word $good_words[rand(0, count($good_words)-1] <-- Selects a random good word from $good_words

Anonymous function

You can rewrite the above as one insert using the anonymous function in preg_replace_callback

 echo preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string );

Function wrapper

If you intend to use it several times, you can also write it as a standalone function, although in this case you most likely want to pass good / bad words to the function when you call it (or hard code them all the time), but it depends how you output them ...

 function clean_string($input_string, $bad_words, $good_words){ return preg_replace_callback( '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', function ($matches) use ($good_words){ return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3]; }, $input_string ); } echo clean_string($input_string, $bad_words, $good_words);

Exit

Performing the above functions in sequence with the input list and words shown in the first example:

 This will be interesting but might it be? Perhaps this 'will' work; or couldn't it? This might be interesting but might it be? Perhaps this 'might' work; or couldn't it? This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?

Of course, the replacement words are chosen randomly, so if I refresh the page, I would get something else ... But this shows what it does / does not replace.

NB

Hiding `$bad_words`

 foreach($bad_words as $key=>$word){ $bad_words[$key] = preg_quote($word); }

Word Boundaries `\b`

In this code, I used \b , \s and ^ or $ as word boundaries, there is a good reason for this. While white space , start of string and end of string are considered word boundaries \b will not coincide in all cases, for example:

 \b\$h1t\b <---Will not match

This is because \b matches characters without a word (ie [^a-zA-Z0-9] ), and characters like $ not considered characters of a word.

miscellanea

Depending on the size of the word list, there are several potential hiccups. From the point of view of the design of the system as a whole, the poor form has a huge number of regular expressions for several reasons:

Hard to maintain
- It is hard to read / understand what he is doing
- Hard to find errors
It can be intense in memory if the list is too long.

Given that the regex pattern is compiled by PHP , the first reason is denied. The second should also be negative; if you are a large list of words with a dozen permutations of each bad word, then I suggest you stop and rethink your approach (read: use the marking / moderation system).

To clarify, I do not see a problem with a small list of words to filter out specific curses, as this serves the purpose of: stopping users from flash from each other; The problem occurs when you try to filter out too much , including permutations. Adhere to filtering ordinary abusive words, and if this does not work, then - for the last time - implement a marking / deceleration system.

Ilia Rostovtsev · Answer 2 · 2013-10-14T11:24:05+0000

I approached this method and it works great. Return true if there is a record of bad words in the record.

Example:

 function badWordsFilter($inputWord) { $badWords = Array("bad","words","here"); for($i=0;$i<count($badWords);$i++) { if($badWords[$i] == strtolower($inputWord)) return true; } return false; }

Using:

 if (badWordsFilter("bad")) { echo "Bad word was found"; } else { echo "No bad words detected"; }

Since the word "bad" is blacklisted, it will be an echo.

Online example 1

EDIT 1:

As suggested by removal, you can also perform a simple in_array check:

 function badWordsFilter($inputWord) { $badWords = Array("bad","words","here"); if(in_array(strtolower($inputWord), $badWords) ) { return true; } return false; }

Online Example 2

EDIT 2:

As I promised, I came up with a slightly different idea of replacing bad words with good words, as you mentioned in your question. I hope this helps you a little, but it is the best that I can offer at the moment, as I am completely not sure what you are trying to do.

Example:

1. Let me combine the array with the good and bad words into one

 $wordsTransform = array( 'shit' => 'ship' );

2. Your imaginary user input

 $string = "Rolling In The Deep by Adel\n \n There a fire starting in my heart\n Reaching a fever pitch, and it bringing me out the dark\n Finally I can see you crystal clear\n Go ahead and sell me out and I'll lay your shit bare";

3. Replacing bad words with good words

 $string = strtr($string, $wordsTransform);

4. Getting the desired result.

Deep swapping
Fire begins in my heart Reaching the height of a fever, and it brings me dark
Finally, I see that you are crystal clear
Go ahead and sell me, and I will leave my ship naked.

Example on the Internet 3

EDIT 3:

To follow the correct comment from Wrikken, I completely forgot that strtr case sensitive and it is better to follow the word boundary. I gave the following example from PHP: strtr - Manual and slightly modified it.

The same as in my second edit, but not case sensitive, checks the word boundaries and places a backslash in front of each character that is part of the regular expression syntax:

1. Method:

 // // Written by Patrick Rauchfuss class String { public static function stritr(&$string, $from, $to = NULL) { if(is_string($from)) $string = preg_replace("/\b{$from}\b/i", $to, $string); else if(is_array($from)) { foreach ($from as $key => $val) self::stritr($string, $key, $val); } return preg_quote($string); // return and add a backslash to special characters } }

2. an array with good and bad words

 $wordsTransform = array( 'shit' => 'ship' );

3. Replacement

 String::stritr($string, $wordsTransform);

Php oath word filter

Forerunners

Decision

Anonymous function

Function wrapper

Exit

NB

Hiding `$bad_words`

Word Boundaries `\b`

miscellanea

Online example 1

Online Example 2

Example on the Internet 3

Online Example 4

More articles:

Php oath word filter

Forerunners

Decision

Anonymous function

Function wrapper

Exit

NB

Hiding $bad_words

Word Boundaries \b

miscellanea

Online example 1

Online Example 2

Example on the Internet 3

Online Example 4

More articles:

Hiding `$bad_words`

Word Boundaries `\b`