Short answer, do the following:
public static String readFile( String filePath ) throws IOException { Reader reader = new FileReader( filePath ); StringBuilder sb = new StringBuilder(); char buffer[] = new char[16384];
It is very simple, very fast and works well for unreasonable large text files (100+ MB).
Long answer:
(Code at the end)
Many times this does not matter, but this method is quite fast and quite readable. In fact, its complexity order is faster than the @Raceimation answer - O (n) instead of O (n ^ 2).
I tested six methods (from slow to fast):
- concat: reading line by line, concat with str + = ... * This is alarmingly slow even for small files (takes ~ 70 seconds for a 3 MB file) *
- guessing length strbuilder: StringBuilder initialized by file size. I guess it is slow because it is really trying to find such a huge chunk of linear memory.
- strbuilder with line buffer: StringBuilder, file read line by line
- strbuffer with char [] buffer: Concat with StringBuffer, read the file in 16k blocks
- strbuilder with char [] buffer: Concat with StringBuilder, read the file in 16k blocks
- preallocate byte [filesize] buffer: Allocate a byte [] buffer with the file size and let the java api decide how to buffer individual blocks.
Output:
The entire buffer preallocation is the fastest in very large files, but the method is not very universal, because the total file size must be known in advance. That's why I suggest using strBuilder with char [] buffers, it is still simple and, if necessary, easy to replace to accept any input streams, not just files. However, it is definitely fast enough for all reasonable cases.
Test Results + Code
import java.io.*; public class Test { static final int N = 5; public final static void main( String args[] ) throws IOException{ test( "1k.txt", true ); test( "10k.txt", true ); // concat with += would take ages here, so we skip it test( "100k.txt", false ); test( "2142k.txt", false ); test( "pruned-names.csv", false ); // ah, what the heck, why not try a binary file test( "/Users/hansi/Downloads/xcode46graphicstools6938140a.dmg", false ); } public static void test( String file, boolean includeConcat ) throws IOException{ System.out.println( "Reading " + file + " (~" + (new File(file).length()/1024) + "Kbytes)" ); strbuilderwithchars( file ); strbuilderwithchars( file ); strbuilderwithchars( file ); tick( "Warm up... " ); if( includeConcat ){ for( int i = 0; i < N; i++ ) concat( file ); tick( "> Concat with += " ); } else{ tick( "> Concat with += **skipped** " ); } for( int i = 0; i < N; i++ ) strbuilderguess( file ); tick( "> StringBuilder init with length " ); for( int i = 0; i < N; i++ ) strbuilder( file ); tick( "> StringBuilder with line buffer " ); for( int i = 0; i < N; i++ ) strbuilderwithchars( file ); tick( "> StringBuilder with char[] buffer" ); for( int i = 0; i < N; i++ ) strbufferwithchars( file ); tick( "> StringBuffer with char[] buffer " ); for( int i = 0; i < N; i++ ) singleBuffer( file ); tick( "> Allocate byte[filesize] " ); System.out.println(); } public static long now = System.currentTimeMillis(); public static void tick( String message ){ long t = System.currentTimeMillis(); System.out.println( message + ": " + ( t - now )/N + " ms" ); now = t; } // StringBuilder with char[] buffer // + works if filesize is unknown // + pretty fast public static String strbuilderwithchars( String filePath ) throws IOException { Reader reader = new FileReader( filePath ); StringBuilder sb = new StringBuilder(); char buffer[] = new char[16384]; // read 16k blocks int len; // how much content was read? while( ( len = reader.read( buffer ) ) > 0 ){ sb.append( buffer, 0, len ); } reader.close(); return sb.toString(); } // StringBuffer with char[] buffer // + works if filesize is unknown // + faster than stringbuilder on my computer // - should be slower than stringbuilder, which confuses me public static String strbufferwithchars( String filePath ) throws IOException { Reader reader = new FileReader( filePath ); StringBuffer sb = new StringBuffer(); char buffer[] = new char[16384]; // read 16k blocks int len; // how much content was read? while( ( len = reader.read( buffer ) ) > 0 ){ sb.append( buffer, 0, len ); } reader.close(); return sb.toString(); } // StringBuilder init with length // + works if filesize is unknown // - not faster than any of the other methods, but more complicated public static String strbuilderguess(String filePath) throws IOException { File file = new File( filePath ); BufferedReader reader = new BufferedReader(new FileReader(file)); String line; StringBuilder sb = new StringBuilder( (int)file.length() ); while( ( line = reader.readLine() ) != null) { sb.append( line ); } reader.close(); return sb.toString(); } // StringBuilder with line buffer // + works if filesize is unknown // + pretty fast // - speed may (!) vary with line length public static String strbuilder(String filePath) throws IOException { BufferedReader reader = new BufferedReader(new FileReader(filePath)); String line; StringBuilder sb = new StringBuilder(); while( ( line = reader.readLine() ) != null) { sb.append( line ); } reader.close(); return sb.toString(); } // Concat with += // - slow // - slow // - really slow public static String concat(String filePath) throws IOException { BufferedReader reader = new BufferedReader(new FileReader(filePath)); String line, results = ""; int i = 0; while( ( line = reader.readLine() ) != null) { results += line; i++; } reader.close(); return results; } // Allocate byte[filesize] // + seems to be the fastest for large files // - only works if filesize is known in advance, so less versatile for a not significant performance gain // + shortest code public static String singleBuffer(String filePath ) throws IOException{ FileInputStream in = new FileInputStream( filePath ); byte buffer[] = new byte[(int) new File( filePath).length()]; // buffer for the entire file int len = in.read( buffer ); return new String( buffer, 0, len ); } } /** *** RESULTS *** Reading 1k.txt (~31Kbytes) Warm up... : 0 ms > Concat with += : 37 ms > StringBuilder init with length : 0 ms > StringBuilder with line buffer : 0 ms > StringBuilder with char[] buffer: 0 ms > StringBuffer with char[] buffer : 0 ms > Allocate byte[filesize] : 1 ms Reading 10k.txt (~313Kbytes) Warm up... : 0 ms > Concat with += : 708 ms > StringBuilder init with length : 2 ms > StringBuilder with line buffer : 2 ms > StringBuilder with char[] buffer: 1 ms > StringBuffer with char[] buffer : 1 ms > Allocate byte[filesize] : 1 ms Reading 100k.txt (~3136Kbytes) Warm up... : 7 ms > Concat with += **skipped** : 0 ms > StringBuilder init with length : 19 ms > StringBuilder with line buffer : 21 ms > StringBuilder with char[] buffer: 9 ms > StringBuffer with char[] buffer : 9 ms > Allocate byte[filesize] : 8 ms Reading 2142k.txt (~67204Kbytes) Warm up... : 181 ms > Concat with += **skipped** : 0 ms > StringBuilder init with length : 367 ms > StringBuilder with line buffer : 372 ms > StringBuilder with char[] buffer: 208 ms > StringBuffer with char[] buffer : 202 ms > Allocate byte[filesize] : 199 ms Reading pruned-names.csv (~11200Kbytes) Warm up... : 23 ms > Concat with += **skipped** : 0 ms > StringBuilder init with length : 54 ms > StringBuilder with line buffer : 57 ms > StringBuilder with char[] buffer: 32 ms > StringBuffer with char[] buffer : 31 ms > Allocate byte[filesize] : 32 ms Reading /Users/hansi/Downloads/xcode46graphicstools6938140a.dmg (~123429Kbytes) Warm up... : 1665 ms > Concat with += **skipped** : 0 ms > StringBuilder init with length : 2899 ms > StringBuilder with line buffer : 2978 ms > StringBuilder with char[] buffer: 2702 ms > StringBuffer with char[] buffer : 2684 ms > Allocate byte[filesize] : 1567 ms **/
Ps. You may have noticed that StringBuffer is a little faster than StringBuilder. This is a little stupid because the classes are the same, except that StringBuilder is not synchronized. If someone can (or) cannot reproduce this ... I am very curious :)