Strange behavior with java scanner read files

So, I just ran into an interesting problem when using the Scanner class to read contents from files. Basically, I'm trying to read several output files generated by a syntax application from a directory to calculate some accuracy metrics.

Basically, my code simply looks at each of the files in the directory and opens them with a scanner to process the contents. For some reason, some of the files (all encoded by UTF-8) were not read by the Scanner. Despite the fact that the files were not empty, scanner.hasNextLine () will return false on the first call (I opened the debugger and noticed this). I initialized the scanner directly with File objects each time (file Objects were successfully created). i.e:

File file = new File(pathName); ... Scanner scanner = new Scanner(file); 

I tried a couple of things and eventually managed to fix this problem by initializing the scanner as follows:

  Scanner scanner = new Scanner(new FileInputStream(file)); 

Although I’m happy to have solved the problem, I’m still wondering what could have happened before. Any ideas? Many thanks!

+7
source share
2 answers

According to Scanner.java source in Java 6u23, a new line is detected

 private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r???]"; private static final String LINE_PATTERN = ".*("+LINE_SEPARATOR_PATTERN+")|.+$"; 

So, you can check if you can match the following regular expression with contents in files that have not been read.

 .*(\r\n|[\n\r???])|.+$ 

I would also see if any exception was raised.

UPDATE: I was curious, and I was looking for answers. It seems your question has already been asked and resolved here: Java scanner (File) malfunction, but scanner (FIleInputStream) always works with the same file

To generalize it to non-ASCII characters, they behave differently depending on whether you initialize your scanner using File or FileInputStream.

+3
source

I would try to check if you always close the scanner after reading the file. Also do you call only hasNextLine () and nextLine (), or do you call another nextXXX () method on these scanners?

0
source

All Articles