I have a regex to split a string into a list of words, numbers, and punctuation. How to make "az" and "0-9" single list items?

It looks like this:

string[] lines = Regex.Split(line, @"\s+|(?!^)(?=\p{P})|(?<=\p{P})(?!$)"); 

Separation of "ASds22d. asd ,156" into "ASds22d" + "." + "asd" + "," + "156" "ASds22d" + "." + "asd" + "," + "156" .

Here's the problem with strings like "az", "0-9", or variations like "ac" and "4-5". My regular expression divides "az 1-9" into "a" + "-" + "z" + "1" + "-" + "9" , but I just need to "az" + "1-9" .

Can someone fix this regex?

+5
source share
1 answer
 \s+|(?!^|-)(?=\p{P})|(?<=\p{P})(?<!-)(?!$) 

You can try something like this. It will not be broken into - . If you have examples that require separation by - , it can OR ed again.

See the demo.

https://regex101.com/r/iS6jF6/3

+3
source

Source: https://habr.com/ru/post/1215132/


All Articles