Split a string based on each final state of an Endinistic Finite state machine?

I have a problem that has a solution that can be solved by iteration, but I wonder if there is a more elegant solution using regular expressions and split()

I have a line (which excel puts on the clipboard), which is essentially separated by a comma. The caveat is that when cell values ​​contain a comma, the whole cell is surrounded by quotation marks (presumably to avoid commas inside this line). An example line looks like this:

123,12,"12,345",834,54,"1,111","98,273","1,923,002",23,"1,243"

Now I want to elegantly split this string into separate cells, but the trick is not to use the normal separated comma expression as a separator, because it will divide the cells containing the comma in their value. Another way to look at this problem is that I can ONLY split the comma if there is the number of EVEN quotes preceding the comma.

This is easy to solve with a loop, but I wonder if there is a regular expression.split function capable of capturing this logic. In an attempt to solve this problem, I built deterministic finite state machines (DFAs) for logic.

alt text

: , (/s) , ( 4) DFA?

+2
2

(unescaped): (?:(?:"[^"]*")|(?:[^,]*))

Regex.Matches(), .NET, .

: ^(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*))(?:,(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*)))*$

1 , ( .NET).

0

, VBScript lookaheads. :

",(?=(?:[^""]*""[^""]*"")*[^""]*$)"
0

All Articles