Delete specific rows from StringBuffer

The inherited application program has a huge String Buffer (sometimes up to Mb in size) and is processed sequentially to change the contents. I have to make a change in which I need to update the line buffer to remove some lines starting with certain specific words. What are the possible ways to implement it?

Example:

ABC:djfk kdjf kdsjfk# ABC:jfue eijf iefe# DEL:kdjfi efe eei # DEL:ieeif dfddf dfdf# HJU:heuir fwer ouier# ABC:dsf erereree ererre # 

I need to delete lines starting with DEL . Splitting a string buffer into a string, processing the strings, and re-joining the strings to create a string buffer would be quite expensive. Pls tell me the possible solutions.

thanks

+4
source share
3 answers

This can be done efficiently on site. You will need to rewrite the characters in the buffer at appropriate intervals, and then you will logically reduce the buffer with setLength . It will be quite difficult, but it will be in place and O(N) .

The reason you want to overwrite instead of using delete / insert is because it will be O(N^2) instead, because things need to be biased unnecessarily.

The implementation of this non-standard behavior is quite trivial and O(N) , but this will require an additional buffer that doubles the required space.


Proof of concept

Here is a simple proof of concept. removeIntervals takes a StringBuffer and a int[][] intervals . Each int[] is considered a pair of values { start, end } (half-open range, exceptional upper bound). In linear time and in place, these intervals are removed from StringBuffer simple overwrite . This works when intervals are sorted and not overlapping, and processed from left to right.

The buffer is then shortened with setLength , cutting off as many characters that have been deleted.

 static void overwrite(StringBuffer sb, int dst, int srcFrom, int srcTo) { for (int i = srcFrom; i < srcTo; i++) { sb.setCharAt(dst++, sb.charAt(i)); } } static int safeGet(int[][] arr, int index, int defaultValue) { return (index < arr.length) ? arr[index][0] : defaultValue; } static void removeIntervals(StringBuffer sb, int[][] intervals) { int dst = safeGet(intervals, 0, 0); int removed = 0; for (int i = 0; i < intervals.length; i++) { int start = intervals[i][0]; int end = intervals[i][1]; int nextStart = safeGet(intervals, i+1, sb.length()); overwrite(sb, dst, end, nextStart); removed += end - start; dst += nextStart - end; } sb.setLength(sb.length() - removed); } 

Then we can check it as follows:

  String text = "01234567890123456789"; int[][][] tests = { { { 0, 5, }, }, // simple test, removing prefix { { 1, 2, }, { 3, 4, }, { 5, 6, } }, // multiple infix removals { { 3, 7, }, { 18, 20, }, }, // suffix removal { }, // no-op { { 0, 20 }, }, // remove whole thing { { 7, 10 }, { 10, 13 }, {15, 15 }, }, // adjacent intervals, empty intervals }; for (int[][] test : tests) { StringBuffer sb = new StringBuffer(text); System.out.printf("> '%s'%n", sb); System.out.printf("- %s%n", java.util.Arrays.deepToString(test)); removeIntervals(sb, test); System.out.printf("= '%s'%n%n", sb); } 

Prints ( as seen on ideone.com ):

 > '01234567890123456789' - [[0, 5]] = '567890123456789' > '01234567890123456789' - [[1, 2], [3, 4], [5, 6]] = '02467890123456789' > '01234567890123456789' - [[3, 7], [18, 20]] = '01278901234567' > '01234567890123456789' - [] = '01234567890123456789' > '01234567890123456789' - [[0, 20]] = '' > '01234567890123456789' - [[7, 10], [10, 13], [15, 15]] = '01234563456789' 

Getting Intervals

In this particular case, the intervals can be constructed in a preliminary pass (using indexOf ), or the whole process can be performed in one pass, if absolutely necessary. The fact is that this can definitely be done on-site in linear time (and, if absolutely necessary, in one pass).


Onsite solution

This is out of place using a secondary buffer and regular expression. This has been proposed for consideration because of its simplicity. If further optimization is not required (after the results of evidence-based profiling), this should be good enough:

  String text = "DEL: line1\n" + "KEP: line2\r\n" + "DEL: line3\n" + "KEP: line4\r" + "DEL: line5\r" + "DEL: line6\r" + "KEP: line7\n" + "DEL: line8"; StringBuffer sb = new StringBuffer(text); Pattern delLine = Pattern.compile("(?m)^DEL:.*$"); String cleanedUp = delLine.matcher(sb).replaceAll("<deleted!>"); System.out.println(cleanedUp); 

This prints ( as seen on ideone.com ):

 <deleted!> KEP: line2 <deleted!> KEP: line4 <deleted!> <deleted!> KEP: line7 <deleted!> 

References

+2
source

Splitting a string buffer into a string, processing the lines and rejoining the strings to create a string buffer will be a little expensive.

Deleting lines will actually be much more expensive because you will end up copying the rest of the buffer for each line you delete.

The fastest way is java.util.regex.Matcher.replaceAll () to get a copy of the buffer without all that you don't want.

+2
source

If the lines in the line buffer are separated by newline characters, you can read it and create a new buffer. For a 1 megabyte buffer, this completes in tens of milliseconds and faster than a regular expression. You can create your own version of StringReader to read StringBuffer directly, rather than converting to a string to save a bit more time.

 final String NEWLINE = System.getProperty("line.separator"); StringBuffer nuBuffer = new StringBuffer(); BufferedReader br = new BufferedReader(new StringReader(sbData.toString())); String line; while ( (line = br.readLine()) != null) { if (!line.startsWith("DEL:")) { // don't copy lines starting with DEL: nuBuffer.append(line).append(NEWLINE); } } br.close(); 
+1
source

Source: https://habr.com/ru/post/1316021/


All Articles