What is a good way to split lines here?

Question

What is a good way to split lines here?

I have the following line:
A:B:1111;domain:80;a;b
A is optional, therefore B:1111;domain:80;a;b also valid. :80 is optional, so B:1111;domain;a;b or :1111;domain;a;b also valid for input. I want to end with String[] , which has:

 s[0] = "A"; s[1] = "B"; s[2] = "1111"; s[3] = "domain:80" s[4] = "a" s[5] = "b"

I did it as follows:

 List<String> tokens = new ArrayList<String>(); String[] values = s.split(";"); String[] actions = values[0].split(":"); for(String a:actions){ tokens.add(a); } //Start from 1 to skip A:B:1111 for(int i = 1; i < values.length; i++){ tokens.add(values[i]); } String[] finalResult = tokens.toArray();

I was wondering if there is a better way to do this? How else can I do this more efficiently?

+7

java optimization string regex

Jim May 16 '12 at 12:58

source share

5 answers

With some assumptions about valid characters, this regular expression provides validation as well as splitting into groups you want.

 Pattern p = Pattern.compile("^((.+):)?(.+):(\\d+);(.+):(\\d+);(.+);(.+)$"); Matcher m = p.matcher("A:B:1111;domain:80;a;b"); if(m.matches()) { for(int i = 0; i <= m.groupCount(); i++) System.out.println(m.group(i)); } m = p.matcher("B:1111;domain:80;a;b"); if(m.matches()) { for(int i = 0; i <= m.groupCount(); i++) System.out.println(m.group(i)); }

gives:

 A:B:1111;domain:80;a;b // ignore this A: // ignore this A // This is the optional A, check for null B 1111 domain 80 a b

and

 B:1111;domain:80;a;b // ignore this null // ignore this null // This is the optional A, check for null B 1111 domain 80 a b

+1

Ina May 16 '12 at 13:06

source share

you can do something like

 String str = "A:B:1111;domain:80;a;b"; String[] temp; /* delimiter */ String delimiter = ";"; /* given string will be split by the argument delimiter provided. */ temp = str.split(delimiter); /* print substrings */ for(int i =0; i < temp.length ; i++) System.out.println(temp[i]);

0

newSpringer May 16 '12 at 13:03

source share

If this is not a bottleneck in your code, and you have confirmed that you don’t worry about efficiency, as the logic is reasonable here. You can avoid creating a list of temporary arrays and instead directly create an array, as you know, the required size.

0

Ashwinee k jha May 16 '12 at 13:03

source share

If you want to keep the domain and port together, then I believe that you will need two splits. You can do this with some regular expression magic, but I would doubt that you would see any real benefits from it.

If you do not mind the separation of domain and port, then:

  String s= "A:B:1111;domain:80;a;b"; List<String> tokens = new ArrayList<String>(); String[] values = s.split(";|:"); for(String a : values){ tokens.add(a); }

0

Konstantin naryshkin May 16, '12 at 13:11

source share

Anony-mousse · Accepted Answer · 2012-05-16T13:08:21+0000

There are not so many problems with efficiency, everything that I see is linear.

In either case, you can use a regex or a manual tokenizer.

You can avoid the list. You know the length of the values and actions , so you can do

 String[] values = s.split(";"); String[] actions = values[0].split(":"); String[] result = new String[actions.length + values.length - 1]; System.arraycopy(actions, 0, result, 0, actions.legnth); System.arraycopy(values, 1, result, actions.length, values.length - 1); return result;

It should be effective enough if you do not insist on implementing split yourself.

Unconfirmed approach at a low level (required before unit test and benchmark before use):

 // Separator characters, as char, not string. final static int s1 = ':'; final static int s2 = ';'; // Compute required size: int components = 1; for(int p = Math.min(s.indexOf(s1), s.indexOf(s2)); p < s.length() && p > -1; p = s.indexOf(s2, p+1)) { components++; } String[] result = new String[components]; // Build result int in=0, i=0, out=Math.min(s.indexOf(s1), s.indexOf(s2)); while(out < s.length() && out > -1) { result[i] = s.substring(in, out); i++; in = out + 1; out = s.indexOf(s2, in); } assert(i == result.length - 1); result[i] = s.substring(in, s.length()); return result;

Note: this code is optimized in a crazy way that it will consider : only in the first component. Processing the last component is a bit complicated since out will have a value of -1 .

I would usually not use this latter approach if performance and memory are not extremely important. Most likely, it still has some errors, and the code is quite unreadable, especially in comparison with the above.

What is a good way to split lines here?

More articles: