Remove commas in string surrounded by comma and double quotes / Python

I found several similar topics in stackoverflow, but I'm new to Python and Reg Exps.

I have a line

β€œCompletely refurbished in 2009, the 2-star superior Ibis Hotel Berlin Messe, with its 168 air-conditioned rooms, is located next to the Berlin ICC and the exhibition center. All rooms have Wi-Fi. Internet surfing is free on two iPoint-PCs in foyer. We have a 24-hour bar, snacks and reception. Enjoy a breakfast buffet from 4 AM to 12 PM on the 8th floor, where you have fantastic views of Berlin. You will find a free car park directly next to the hotel. ",

The sample should look like this: comma, double quote|any text with commas |double quote, comma . I need to replace the commas in double quotes, for example with the @ symbol. What version of reg exp should i use?

I tried this:

 r',"([.*]*,[.*]*)*",' 

with different options, but it does not work.

Thanks for the answers, the problem has been resolved.

+7
python regex
source share
4 answers

If all you have to do is replace the commas with the @ character, you should examine str_replace , not the regular expression.

 str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel." str_a = str_a.replace('","', '@') #commas inside double quotes str_a = str_a.replace(',', '@') #replace just commas print str_a 

Edit: Alternatively, you can make a list of what you want to replace, then scroll through it and complete the replacement. Example:

 to_replace = ['""', ',', '"'] str_a = "Completely renovated in 2009, the 2-star Superior Hotel Ibis Berlin Messe, with its 168 air-conditioned rooms, is located right next to Berlin ICC and exhibition center. All rooms have Wi-Fi, and you can surf the Internet free of charge at two iPoint-PCs in the lobby. We provide a 24-hour bar, snacks and reception service. Enjoy our breakfast buffet from 4am to 12pm on the 8th floor, where you have a fantastic view across Berlin. You will find free car parking directly next to the hotel." for a in to_replace: str_a = str_a.replace(a, '@') print str_a 
+2
source share

Hmm, your regular expression is suspicious.

 ,"([.*]*,[.*]*)*", 

[.*] will match a literal dot or asterisk ( . and * become literals in character classes).

In addition, if this could actually correspond to something in the line, you could replace only one comma, because the rest of the line (including the comma) would be consumed by the regular expression and once consumed, cannot be replaced again if you did not run loop until no commas are added.

What you can do with re.sub and replace these commas is to use the return paths (you can do this, enough documentation about them, I think). If you have only one pair of double quotes, you can make sure that only commas are replaced, followed by one double quote:

 ,(?=[^"]*"[^"]*$) 

[^"] means a character that is not a double quote. [^"]* means that it will be repeated 0 or more times.

$ means end of line.

Now, lookahead (?= ... ) guarantees that there is something inside the comma.

See the commas that match here .

After that, you can simply replace the commas with whatever value you want.

 str = re.sub(r',(?=[^"]*"[^"]*$)', '@', str) 

If, however, there are several double quotes, you must ensure that there is an odd number of double quotes ahead. This can be done using regex:

 ,(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$) 

(?: ... ) , by the way, is a group without capture.

+2
source share

You can try this (rather deadly). The trick is that any character inside a pair of double quotes is followed by an odd number of double quotes, unless, of course, your double quotes are balanced:

 s = 'some comma , outside "Some comma , inside" , "Completely , renovated in 2009",' import re s = re.sub(r',(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)', "@", s) print s 

Exit

 some comma , outside "Some comma @ inside" , "Completely @ renovated in 2009", 
+2
source share

If the template is always specified as indicated, the following code snippet will do what you want:

 text = ',' + text[1:-2].replace(',', '@') + ',' 

Discussion

  • text[1:-2] will give you the original string, minus the first and last characters (commas)
  • Then we call .replace() to turn all commas into characters
  • Finally, we return the first and last commas to form the resulting string
+2
source share

All Articles