Reading a file using a Java scanner

One of the lines in the java file that I am trying to understand is as follows.

return new Scanner(file).useDelimiter("\\Z").next(); 

It is expected that the file will return to "End of input, but for the final terminator, if any" according to docs java.util.regex.Pattern. But what happens is that it only returns the first 1024 characters from the file. Is this a regular expression regex constraint? Can this be overcome? I am currently advancing using filereader. But I would like to know the reason for this behavior.

+6
java java.util.scanner regex file-io filereader
source share
4 answers

Try wrapping the file object in a FileInputStream

+2
source share

I myself could not reproduce this. But I think I can shed light on what is happening.

Inside the scanner, a character buffer of 1024 characters is used. The scanner will read from your Readable 1024 characters by default, if possible, and then apply the template.

The problem is in your template ... it will always correspond to the end of the input, but that does not mean the end of your input stream / data. When Java applies your pattern to buffered data, it tries to find the first occurrence of the end of the input. Since there are 1024 characters in the buffer, the corresponding engine calls position 1024, the first match of the delimiter and all before it is returned as the first token.

I do not think that the end-of-entry anchor is valid for use in the Scanner for this reason. In the end, it could be reading from an endless stream.

+5
source share

Scanner designed to read several primitives from a file. It really is not intended to read the whole file.

If you do not want to include third-party libraries, you better off disabling the BufferedReader loop, which wraps the text FileReader / InputStreamReader for text, or the loop for FileInputStream for binary data.

If you use a third-party library in order, Apache commons-io has FileUtils , which contains the static methods readFileToString and readLines for text and readFileToByteArray for binary data.

+1
source share

You can use the Scanner class, just specify char -set when opening the scanner, that is:

 Scanner sc = new Scanner(file, "ISO-8859-1"); 

Java converts bytes read from a file into characters using the specified encoding, which is the default default (from the base OS) if nothing is specified ( source ). It’s still not clear to me why the Scanner reads only 1024 bytes by default, and the other to the end of the file. Anyway, it works great!

0
source share

All Articles