Censor selected words (replacing them with ****) with a single replaceAll?

I would like to censor some of the words in the string, replacing each character of the word with the word "*". Basically i would like to do

String s = "lorem ipsum dolor sit"; s = s.replaceAll("ipsum|sit", $0.length() number of *)); 

so the resulting s is "lorem ***** dolor ***" .

I know how to do this with repeat replaceAll invokations, but I wonder if it is possible to do this with one replaceAll ?


Update: This is part of a case study, and the reason is mainly because I would like to leave with a single layer, as it simplifies the generated bytecode a bit. This is not for a serious webpage or anything else.

+2
java regex
source share
4 answers

This changes the response to aioobe using nested statements instead of a nested loop to generate statements:

 public static void main(String... args) { String s = "lorem ipsum dolor sit blah $10 bleh"; System.out.println(s.replaceAll(censorWords("ipsum", "sit", "$10"), "*")); // prints "lorem ***** dolor *** blah *** bleh" } public static String censorWords(String... words) { StringBuilder sb = new StringBuilder(); for (String w : words) { if (sb.length() > 0) sb.append("|"); sb.append( String.format("(?<=(?=%s).{0,%d}).", Pattern.quote(w), w.length()-1 ) ); } return sb.toString(); } 

Some key points:

  • StringBuilder.append in a loop instead of String +=
  • Pattern.quote to avoid any $ or \ in censored words

However, this is not the best solution to the problem. This is just a fun regular expression game to play, really.

Related Questions

  • codingBat plusOut using regex

How it works

We want to replace with "*" , so we need to match one character at a time. The question is what kind of character.

This is a symbol where, if you come back long enough, and then you look forward, you see a censoring word.

Here's the regular expression in a more abstract form:

 (?<=(?=something).{0,N}) 

This corresponds to the provisions in which you can return to the characters N , you can look and see something .

+4
source share

Zero-width images are possible:

 public class Test { public static void main(String... args) { String s = "lorem ipsum dolor sit"; System.out.println(s.replaceAll(censorWords("ipsum", "sit"), "*")); } public static String censorWords(String... words) { String re = ""; for (String w : words) for (int i = 0; i < w.length(); i++) re += String.format("|((?<=%s)%s(?=%s))", w.substring(0, i), w.charAt(i), w.substring(i + 1)); return re.substring(1); } } 

Print

 lorem ***** dolor *** 

The generated regular expression is not very pretty, but it does the trick :-)

+4
source share

This is not the best way to censor text. Jeff Atwood has an excellent article on censorship in this way.

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

If you are not going to spend a lot of time on this censorship function, it is likely to censor things that should not be.

Another note:
Casting Java code into a 1-liner will not necessarily simplify the bytecode. Using this logic, you can throw your censorship code into one method, and then just use it.

+3
source share

The Java replacement method does not accept the callback as an argument; so it’s not easy. But since profanity filters are mainly used on the Internet, I assume you can use JavaScript to do this.

 var s = "this is some sample text to play with"; var r = s.replace(/\b(some|sample|to)\b/g, function() { var star = "*"; var len = arguments[1].length; while(--len) star += "*"; return star; }); console.log(r);//this is **** ****** text ** play with 
+2
source share

All Articles