How to replace dashes between characters with a space using regular expression

I want to replace the dashes that appear between letters with a space using regular expression. For example, to replace ab-cd with ab cd

The following corresponds to a sequence of character-characters, but also replaces the characters [i.e. ab-cd leads to ad , not ab cd as I wish]

  new_term = re.sub(r"[Az]\-[Az]", " ", original_term) 

How do I adapt the above to replace the - part?

+6
source share
4 answers

You need to write the characters before and after - in the group and use them to replace, that is:

 import re subject = "ab-cd" subject = re.sub(r"([az])\-([az])", r"\1 \2", subject , 0, re.IGNORECASE) print subject #ab cd 

Demo

http://ideone.com/LAYQWT


SAMPLE REGEX

 ([Az])\-([Az]) Match the regex below and capture its match into backreference number 1 «([Az])» Match a single character in the range between "A" and "z" «[Az]» Match the character "-" literally «\-» Match the regex below and capture its match into backreference number 2 «([Az])» Match a single character in the range between "A" and "z" «[Az]» \1 \2 Insert the text that was last matched by capturing group number 1 «\1» Insert the character " " literally « » Insert the text that was last matched by capturing group number 2 «\2» 
+6
source

Use links to capture groups:

 >>> original_term = 'ab-cd' >>> re.sub(r"([Az])\-([Az])", r"\1 \2", original_term) 'ab cd' 

This assumes, of course, that you cannot just do original_term.replace('-', ' ') for any reason. Perhaps your text uses hyphens where it should use en dashes or something else.

+6
source

re.sub() always replaces the entire agreed sequence with a replacement.

The decision to just replace the dashes is with the statements of lookahead and lookbehind. They do not take into account the agreed sequence.

 new_term = re.sub(r"(?<=[Az])\-(?=[Az])", " ", original_term) 

The syntax is explained in the Python documentation for the re module .

+2
source

You need to use look-arounds:

  new_term = re.sub(r"(?i)(?<=[AZ])-(?=[AZ])", " ", original_term) 

Or capture groups:

  new_term = re.sub(r"(?i)([AZ])-([AZ])", r"\1 \2", original_term) 

Watch the IDEONE demo

Note that [Az] also matches some non-letters (namely [ , \ , ] , ^ , _ and ` ), so I suggest replacing it with [Az] and use the case insensitive modifier (?i) .

Note that you do not need to hide the hyphen outside the character class.

+1
source

All Articles