Regular expression how to divide by | and avoid separation when \ before

I have the following text

aaa|bbbb|cccc|dddd\|eeee|ffff 

and I want to divide by | and excluding when | precedes \ and gets

aah

BBBB

ssss

dddd \ | uh

Ffff

Thanks.

ps: I tried to use some kind of regular expression generator (e.g. http://txt2re.com/ ), but frankly regular expression is nothing but friendly.

update: finally i give up. Regexp is not fast (I did a test), none of them are clear (compared to a function that everyone can follow), then I will skip it, and now I am using real code.

+4
source share
3 answers

I tried to add this as a comment to the request without the knowledge, but I don’t know how to format it there ...

In any case, the answer to my eyelids seems correct:

  String str = "aaa|bbbb|cccc|dddd\\|eeee|ffff"; String[] tokens = str.split("(?<!\\\\)\\|"); System.out.println(Arrays.toString(tokens)); 

which prints:

 [aaa, bbbb, cccc, dddd\|eeee, ffff] 
+2
source

This should do it:

 (?<!\\\\)\\| 

If you want to allow backslashes with backslashes, you can use:

 (?<!(?<!\\\\)\\\\)\\| 

Thus, for the line aaa|bbbb|cccc|dddd\|eeee\\|ffff separation will be:

 aaa bbbb cccc dddd|eeee\* ffff 

* Or dddd\|eeee\\ , if for some reason you do not disable backtracks.

Edit: not familiar with Java regular expression, escape sequences for one comment with ratchet added.

+2
source

Do not use split() for this. (You could if Java supported indefinite repetition inside lookbehind statements, but that is not the case.)

Better to collect all the matches between | s:

 List<String> matchList = new ArrayList<String>(); Pattern regex = Pattern.compile("(?:\\\\.|[^\\\\|])*"); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { matchList.add(regexMatcher.group()); } 

This correctly breaks aaa|bbbb\\|cccc|dddd\|eeee|ffff\\\|ggg\\\\|hhhh into

 aaa bbbb\\ cccc dddd\|eeee ffff\\\|ggg\\\\ hhhh 
+1
source

All Articles