Split a line with a separator

I want to split a line with a separator space. but it should handle quoted strings reasonably. For example. for string type

"John Smith" Ted Barry 

He must return three lines of John Smith, Ted and Barry.

+7
source share
5 answers

After talking with him, you can use Regex for this. Run the equivalent of "match all":

 ((?<=("))[\w ]*(?=("(\s|$))))|((?<!")\w+(?!")) 

Java example:

 import java.util.regex.Pattern; import java.util.regex.Matcher; public class Test { public static void main(String[] args) { String someString = "\"Multiple quote test\" not in quotes \"inside quote\" \"A work in progress\""; Pattern p = Pattern.compile("((?<=(\"))[\\w ]*(?=(\"(\\s|$))))|((?<!\")\\w+(?!\"))"); Matcher m = p.matcher(someString); while(m.find()) { System.out.println("'" + m.group() + "'"); } } } 

Exit:

 'Multiple quote test' 'not' 'in' 'quotes' 'inside quote' 'A work in progress' 

The regex distribution with the above example can be seen here:

http://regex101.com/r/wM6yT9


Despite the fact that regular expressions should not be the solution for everything - I was just having fun. There are many extreme cases in this example, such as handling Unicode characters, characters, etc. You would be better off using a proven and true library for this kind of task. Take a look at the other answers before using this one.

+10
source

Try this ugly bit of code.

  String str = "hello my dear \"John Smith\" where is Ted Barry"; List<String> list = Arrays.asList(str.split("\\s")); List<String> resultList = new ArrayList<String>(); StringBuilder builder = new StringBuilder(); for(String s : list){ if(s.startsWith("\"")) { builder.append(s.substring(1)).append(" "); } else { resultList.add((s.endsWith("\"") ? builder.append(s.substring(0, s.length() - 1)) : builder.append(s)).toString()); builder.delete(0, builder.length()); } } System.out.println(resultList); 
+4
source

Ok, I did a little sniper that does what you want, and something else. since you did not specify any more conditions, I did not experience the problem. I know this is a dirty way, and you can probably get better results with what has already been done. but for the pleasure of programming, here is an example:

  String example = "hello\"John Smith\" Ted Barry lol\"Basi German\"hello"; int wordQuoteStartIndex=0; int wordQuoteEndIndex=0; int wordSpaceStartIndex = 0; int wordSpaceEndIndex = 0; boolean foundQuote = false; for(int index=0;index<example.length();index++) { if(example.charAt(index)=='\"') { if(foundQuote==true) { wordQuoteEndIndex=index+1; //Print the quoted word System.out.println(example.substring(wordQuoteStartIndex, wordQuoteEndIndex));//here you can remove quotes by changing to (wordQuoteStartIndex+1, wordQuoteEndIndex-1) foundQuote=false; if(index+1<example.length()) { wordSpaceStartIndex = index+1; } }else { wordSpaceEndIndex=index; if(wordSpaceStartIndex!=wordSpaceEndIndex) { //print the word in spaces System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex)); } wordQuoteStartIndex=index; foundQuote = true; } } if(foundQuote==false) { if(example.charAt(index)==' ') { wordSpaceEndIndex = index; if(wordSpaceStartIndex!=wordSpaceEndIndex) { //print the word in spaces System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex)); } wordSpaceStartIndex = index+1; } if(index==example.length()-1) { if(example.charAt(index)!='\"') { //print the word in spaces System.out.println(example.substring(wordSpaceStartIndex, example.length())); } } } } 

it also checks for words that were not separated by a space after or before quotation marks, such as the words "hello" before "John Smith" and after "Basi German".

when the line is changed to "John Smith" Ted Barry , the output consists of three lines, 1) "John Smith" 2) Ted 3) Barry

The line in the example welcomes "John Smith" Ted Barry Lol "Bazi German" hello and prints 1) Hello 2) "John Smith" 3) Ted 4) Barry 5) lol 6) "Bazi German" 7) hello

Hope this helps

+3
source

commons-lang has a StrTokenizer class to do this for you, and there is also a java-csv library.

Example with StrTokenizer:

 String params = "\"John Smith\" Ted Barry" // Initialize tokenizer with input string, delimiter character, quote character StrTokenizer tokenizer = new StrTokenizer(params, ' ', '"'); for (String token : tokenizer.getTokenArray()) { System.out.println(token); } 

Output:

 John Smith Ted Barry 
+1
source

This is my own version, cleaning up from http://pastebin.com/aZngu65y (posted in a comment). He can take care of Unicode. It will clear all unnecessary spaces (even in quotes) - this can be good or bad depending on the need. No support for hidden quotes.

 private static String[] parse(String param) { String[] output; param = param.replaceAll("\"", " \" ").trim(); String[] fragments = param.split("\\s+"); int curr = 0; boolean matched = fragments[curr].matches("[^\"]*"); if (matched) curr++; for (int i = 1; i < fragments.length; i++) { if (!matched) fragments[curr] = fragments[curr] + " " + fragments[i]; if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)")) matched = false; else { matched = true; if (fragments[curr].matches("\"[^\"]*\"")) fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim(); if (fragments[curr].length() != 0) curr++; if (i + 1 < fragments.length) fragments[curr] = fragments[i + 1]; } } if (matched) { return Arrays.copyOf(fragments, curr); } return null; // Parameter failure (double-quotes do not match up properly). } 

Example input for comparison:

 "sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd asjdhj sdf ffhj "fdsf fsdjh"日本語 中文 "Tiếng Việt" "English" dsfsd sdf " s dfs fsd f " sd f fs df fdssf "日本語 中文" "" "" "" " sdfsfds " "f fsdf 

(2nd line is empty, 3rd line is spaces, the last line is incorrect). Evaluate your own expected result, as it can change, but the initial one is that the 1st case should return [sdfskjf, sdfjkhsd, hfrif ehref, fksdfj sdkfj fkdsjf, sdf, sfssd].

+1
source

All Articles