To avoid reading in the entire file, which may not be possible in your case, you can use RandomAccessFile instead of the standard java FileInputStream . With RandomAccessFile you can use the seek(long position) method to go to an arbitrary place in the file and start reading there. The code will look something like this.
RandomAccessFile raf = new RandomAccessFile("path-to-file","rw"); HashMap<Integer,String> sampledLines = new HashMap<Integer,String>(); for(int i = 0; i < numberOfRandomSamples; i++) { //seek to a random point in the file raf.seek((long)(Math.random()*raf.length())); //skip from the random location to the beginning of the next line int nextByte = raf.read(); while(((char)nextByte) != '\n') { if(nextByte == -1) raf.seek(0);//wrap around to the beginning of the file if you reach the end nextByte = raf.read(); } //read the line into a buffer StringBuffer lineBuffer = new StringBuffer(); nextByte = raf.read(); while(nextByte != -1 && (((char)nextByte) != '\n')) lineBuffer.append((char)nextByte); //ensure uniqueness String line = lineBuffer.toString(); if(sampledLines.get(line.hashCode()) != null) i--; else sampledLines.put(line.hashCode(),line); }
Here sampledLines should hold your randomly selected lines at the end. You may need to verify that you did not accidentally skip to the end of the file to avoid errors in this case.
EDIT: I put it at the beginning of the file if you get to the end. It was a pretty simple check.
EDIT 2: I checked the uniqueness of strings using HashMap .
source share