Is Java string divided into alphanumeric and newlines?

I have a test.txt file containing several lines, for example:

"h3llo, @my name is, bob! (how are you?)" "i am fine@ @@@@" 

I want to split all alphanumeric characters and a new line into arraylist so that the output is

 output = ["h", "llo", "my", "name", "is", "bob", "how", "are", "you", "i", "am", "fine"] 

At the moment, I tried to split my text with

 output.split("\\P{Alpha}+") 

But for some reason this seems to add a comma to the first place in the arraylist and replace the new line with an empty line

 output = ["", "h", "llo", "my", "name", "is", "bob", "how", "are", "you", "", "i", "am", "fine"] 

Is there any other way to fix this? Thanks!

-

EDIT: How can I make sure it ignores a new line?

+6
source share
3 answers

The behavior of Java String.split() rather confusing. A much better utility for splitting Guava Splitter . Their documentation describes problems with String.split() in more detail:

The built-in Java line splitters may have some kind of bizarre behavior. For example, String.split silently discards trailing delimiters, and StringTokenizer matches exactly five space characters and nothing more.

Quiz: ",a,,b,".split(",") returns ...

  • "", "a", "", "b", ""
  • null, "a", null, "b", null
  • "a", null, "b"
  • "a", "b"
  • None of the above

The correct answer has nothing: "", "a", "", "b" . Only trailing blank lines are skipped. I don’t even know what it is.

In your case, this should work:

 Splitter.onPattern("\\P{Alpha}+").omitEmptyStrings().splitToList(output); 
+2
source

Use your regular expression, put the result in an ArrayList (just as you want the data to be at the end anyway), and just use removeIf to remove any empty lines.

 String input = "\"h3llo, @my name is, bob! (how are you?)\"\n\n\"i am fine@ @@@@\""; ArrayList<String> arrayList = new ArrayList<>(Arrays.asList(input.split("\\P{Alpha}+"))); arrayList.removeIf(""::equals); System.out.println(arrayList); 

Result:

[h, llo, my, name, is, bob, how, are, you, i, am, fine]

0
source

Another solution is to use the regex package in java.util.regex. *

It includes Matcher and Pattern.

  String input = "h3llo, @my name is, bob! (how are you?)\n"+ "i am fine@ @@@@"; Pattern p = Pattern.compile("([a-zA-Z]+)"); Matcher m = p.matcher(input); List<String> tokens = new ArrayList<String>(); while (m.find()) { System.out.println("Found a " + m.group()); tokens.add(m.group()); } 

PS A good tool to test your regular expression pattern is https://regex101.com/

0
source

All Articles