I have a regular expression ([-@.\/,':\w]*[\w])*and it matches all the words in the text (including broken words such as IBM), but I want this to exclude underscores, and I cannot figure out how to do this ... I tried to add ^[_](e.g. (^[_][-@.\/,':\w]*[\w])*), but it just breaks all the words into letters. I want to preserve the coincidence of words, but I do not want to have underlined words in them, as well as words that consist entirely of underscores.
What is the right way to do this?
PS
- My application is written in C # (if that matters).
- I can’t use A-Za-z0-9 because I need to match words regardless of the language (maybe Chinese, Russian, Japanese, German, English).
Update
Here is an example:
"IBM should be analyzed as a single word w_o_r_d! Russian should also work: the multiplex of historical events."
Matches must be:
I.B.M.
should
be
parsed
as
one
word
Russian
should
work
too
Please note that w_o_r_dmust not match.
Kiril source
share