Regular expression for finding and replacing text in a large file

I am looking for a multi-line template in a huge file, and if you need to replace it. I want to do this in memory in an efficient way. My current implementation reads text from a file in chunks of 4096 bytes. He then applies the search again and stores the result in the output stream of the buffer. This gives me some improvements in memory without loading the entire file into memory, but I do a lot of I / O using map / flush calls. Need suggestions for further improvement of my code. In addition, the algorithm fails if the pattern search is divided into adjacent pieces. Any ideas on how to effectively search, replace text, divided into adjacent pieces. Assumptions: search text is always less than 4096 bytes.

public void searchAndReplace (String inputFilePath, String outputFilePath) {

    Pattern HEADER_PATTERN =  Pattern.compile("<a [^>]*>[^(</a>)]*</a>", Pattern.DOTALL);
    Charset UTF8 = Charset.forName("UTF-8");
    File outputFile = new File(outputfilepath);
    if (!outputFile.exists()) {
        outputFile.createNewFile();
    }

    FileInputStream inputStream = new FileInputStream(new File(inputfilepath));
    FileOutputStream outputStream = new FileOutputStream(outputFile);

    FileChannel inputChannel = inputStream.getChannel();

    final long length = inputChannel.size();
    long pos = 0;
    while (pos < length) {
        int remaining = (int)(length - pos) > 4096 ? 4096 : (int)(length - pos);
        MappedByteBuffer map = inputChannel.map(FileChannel.MapMode.READ_ONLY, pos, remaining);
        CharBuffer cbuf = UTF8.newDecoder().decode(map);
        Matcher matcher = HEADER_PATTERN.matcher(cbuf);
        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            matcher.appendReplacement(sb, "Some text");
        }
        matcher.appendTail(sb);
        outputStream.write(sb.toString().getBytes());
        outputStream.flush();
        pos = pos + 4096;
    }

    inputStream.close();
    outputStream.close(); 
}
+4
1

, . , , . , . / /[^ ¬] * myRegExHere [^\¬]/g

0

All Articles