This is not ideal (you definitely need to try it with some real examples to see if it works), but this is the beginning:
splitArray = Regex.Split(subjectString,
@"(?<=\p{IsArabic})
[\p{Zs}\p{P}]+
(?=\p{IsBasicLatin})
|
(?<=\p{IsBasicLatin})
[\s\p{P}]+
(?=\p{IsArabic})",
RegexOptions.IgnorePatternWhitespace);
This is divided into space / punctuation if the previous character is from the Arabic block and the next character is from the base Latin block (or vice versa).
source
share