I have a long line, which is a paragraph, however after periods there is no space. For example:
para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."
I am trying to use re.sub to solve this problem, but the result is not the one I expected.
This is what I did:
re.sub("(?<=\.).", " \1", para)
I match the first char of each sentence, and I want to put a space before it. My matching pattern (?<=\.). , which (presumably) checks for any character that appears after the period. I found out from other stackoverflow questions that \ 1 matches the last matching pattern, so I wrote a replacement pattern as \1 , a space followed by a previously matched string.
Here is the result:
"I saw this film about 20 years ago and remember it as being particularly nasty. \x01I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women. \x01t is in black and white but saves the colour for one shocking shot. \x01t the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. \x01void. \x01
Instead of matching any character preceding the period and adding a space before it, re.sub replaced the matched character with \x01 . What for? How to add a character before a matching string?
python regex nlp
versatile parsley
source share