The String class has some methods that I cannot understand why they were implemented as follows ... replace is one of them.
public String replace(CharSequence target, CharSequence replacement) { return Pattern.compile(target.toString(), Pattern.LITERAL).matcher( this).replaceAll(Matcher.quoteReplacement(replacement.toString())); }
Are there any significant advantages compared to a simpler and more efficient (fast!) Method?
public static String replace(String string, String searchFor, String replaceWith) { StringBuilder result=new StringBuilder(); int index=0; int beginIndex=0; while((index=string.indexOf(searchFor, index))!=-1){ result.append(string.substring(beginIndex, index)+replaceWith); index+=searchFor.length(); beginIndex=index; } result.append(string.substring(beginIndex, string.length())); return result.toString(); }
Statistics with Java 7:
1,000,000 iterations
replace "b" with "x" in "abc"
result: "axc"
Times:
string.replace: 485ms
string.replaceAll: 490ms
optimized replacement = 180 ms
Code similar to the Java 7 split method is highly optimized to avoid possible compilation / regular expression processing:
public String[] split(String regex, int limit) { /* fastpath if the regex is a (1)one-char String and this character is not one of the RegEx meta characters ".$|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */ char ch = 0; if (((regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 && ((ch-'a')|('z'-ch)) < 0 && ((ch-'A')|('Z'-ch)) < 0)) && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) { int off = 0; int next = 0; boolean limited = limit > 0; ArrayList<String> list = new ArrayList<>(); while ((next = indexOf(ch, off)) != -1) { if (!limited || list.size() < limit - 1) { list.add(substring(off, next)); off = next + 1; } else { // last one //assert (list.size() == limit - 1); list.add(substring(off, value.length)); off = value.length; break; } } // If no match was found, return this if (off == 0) return new String[]{this}; // Add remaining segment if (!limited || list.size() < limit) list.add(substring(off, value.length)); // Construct result int resultSize = list.size(); if (limit == 0) while (resultSize > 0 && list.get(resultSize - 1).length() == 0) resultSize--; String[] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); } return Pattern.compile(regex).split(this, limit); }
Following the logic of the replacement method:
public String replaceAll(String regex, String replacement) { return Pattern.compile(regex).matcher(this).replaceAll(replacement); }
The split implementation should be:
public String[] split(String regex, int limit) { return Pattern.compile(regex).split(this, limit); }
Performance losses are just around the corner found in replacement methods. For some reason, Oracle provides a fastpath method for some methods, not others.