Split string with regex using release character and delimiters

I need to parse an EDI file where the delimiters are the + ,: and ' signs, and the escape (release) character ? . First you segment

 var data = "NAD+UC+ABC2378::92++XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 71+Duzce+Seferihisar / IZMIR++35460+TR" var segments = data.Split('\''); 

then each segment is divided into segment data elements by + , then segment data elements are divided into component data elements through :

 var dataElements = segments[0].Split('+'); 

the above example line is not being processed correctly due to the use of the release character. I have special code related to this, but I think that all this can be done with

 Regex.Split(data, separator); 

I am not familiar with Regex'es and could not find a way to do this so far. The best I've come up with is

 string[] lines = Regex.Split(data, @"[^?]\+"); 

which lowers the character to the + sign.

 NA U ABC2378::9 +XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7 Duzc Seferihisar / IZMI +3546 TR 

The correct result should be:

 NAD UC ABC2378::92 XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7 Duzce Seferihisar / IZMIR 35460 TR 

So, the question is what can be done with Regex.Split and what should the regular expression separator look like.

+7
c # regex
source share
2 answers

I see that you want to separate the plus + signs only if they are not preceded (shielded) by a question mark ? . This can be done using the following:

 (?<!\?)\+ 

Does this correspond to one or more + signs if they are not preceded by a question mark ? .

Edit: A problem or error with the previous expression if it does not handle situations like ??+ or ???+ or or ????+ , in other words, it does not handle situations when ? used to escape.

We can solve this problem by noting that if there is an odd number ? preceding a + , then the latter, of course, eludes + , so we should not divide, but if there is an even number from ? to plus, then they cancel each, leaving + , so we must separate it.

From the previous observation, we should come up with an expression that matches + only if it is preceded by an even number of question marks ? and here it is:

 (?<!(^|[^?])(\?\?)*\?)\+ 
+3
source share
 string[] lines = Regex.Split(data, @"\+"); 

will this requirement be met

Here is the edit for escaping '?' before the "+".

 string[] lines = Regex.Split(data, @"(?<!\?)[\+]+"); 

The end β€œ+” at the end will correspond to several consecutive meetings of the separator β€œ+”. If you need white spaces.

 string[] lines = Regex.Split(data, @"(?<!\?)[\+]"); 
+1
source share

All Articles