Is there a standard API for checking line separators in Java?

I am using Java SE 6.

My program reads several types of files: from dos to unix and ascii to unicode, and I have to make sure that the line separators for the output file match the input files.

As I do this, I read an example string with the BufferedReader read () function to find the first line separator and saved the line separator to String. That way, it can be used later when I need a new line.

I checked the Scanner class and noticed that possible line breaks might include the following:

\r\n \r \n \u2028 \u2029 \u0085 

Is there a library for checking these characters? Or, even better, is there a library function to check what the input line separator looks like?

Are there any other ways around this?

EDIT: If possible, I would like to use the standard Java API instead of third-party libraries, but all suggestions are welcome.

EDIT: Just to clarify.
1) Input files do not depend on where this program works. For example, if I run this program in Dos, I can still get the Unix input file.
2) My goal is not to read each line with line separators - it's simple. I really need to write an output file with the same line separators as the input file. For example, if I run this program in Dos and I get a Unix input file, I want to be able to write an output file with Unix line separators. That's why I ask if there is a standard API for detecting line separators based on input files, and not for starting the OS.

Thanks.

+6
java
source share
4 answers

The previous three answers do not really address the question. The OP wants to determine from this file: which line separator is used in this file?

This question cannot be defined specifically for a given file, since a file can use multiple lines. It may seem far-fetched, but it is possible.

Thus, the best approach to me, apparently, is to independently analyze the input file, counting the occurrence of possible sequences of line endings and choosing the one that most often appears as a line separator for this file.

I did not come across a library that would implement this functionality.

+3
source share

BufferedReader and readLine () automatically process at least the first three choices of end-of-line markers.

+1
source share

You can get the operating system line separator from System.properties("line.separator") . System Properties Documentation

+1
source share

I searched a lot of time for api for this. But could not find.

I use a similar read approach for the first line separator with regex.

I had to spend some time on the correct Regex to work, and wished this answer had code for this. So I wrote something myself:

 /** * <h1> Identify which line delimiter is used in a string </h1> * * This is useful when processing files that were created on different operating systems. * * @param str - the string with the mystery line delimiter. * @return the line delimiter for windows, {@code \r\n}, <br> * unix/linux {@code \n} or legacy mac {@code \r} <br> * if none can be identified, it falls back to unix {@code \n} */ public static String identifyLineDelimiter(String str) { if (str.matches("(?s).*(\\r\\n).*")) { //Windows //$NON-NLS-1$ return "\r\n"; //$NON-NLS-1$ } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$ return "\n"; //$NON-NLS-1$ } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$ return "\r"; //$NON-NLS-1$ } else { return "\n"; //fallback onto '\n' if nothing matches. //$NON-NLS-1$ } } 
0
source share

All Articles