Removing punctuation except barcode inside Python word

An gsub("[^[:alnum:]['-]", " ", my_string)approximate answer already exists in R , but it does not work in Python:

my_string = 'compactified on a calabi-yau threefold @ ,.'
re.sub("[^[:alnum:]['-]", " ", my_string)

gives 'compactified on a calab yau threefold @ ,.'

Thus, he not only removes the line inside the word, but also removes the last letter of the word preceding the dash. And he does not cancel punctuation

Expected result (string without any punctuation but inside-word): 'compactified on a calabi-yau threefold'

+4
source share
1 answer

R TRE (POSIX) PCRE regex engine perl ( ). Python , , Perl, re. Python POSIX, [:alnum:], () num ().

Python [:alnum:] [^\W_] ( ASCII [a-zA-Z0-9]), [^[:alnum:]] - [\W_] ([^a-zA-Z0-9] ASCII).

[^[:alnum:]['-] 1 , - ( ), [, ' -. , R, , .

:

import re
p = re.compile(r"(\b[-']\b)|[\W_]")
test_str = "No -  d'Ante compactified on a calabi-yau threefold @ ,."
result = p.sub(lambda m: (m.group(1) if m.group(1) else " "), test_str)
print(result)

(\b[-']\b)|[\W_] regex intraword - ', re.sub, , โ€‹โ€‹ m.group(1), ( ) .

, , ,

p = re.compile(r"(\b[-']\b)|[\W_]+") 
+4

All Articles