Replace patterns that are inside delimiters using a regex call

Question

Replace patterns that are inside delimiters using a regex call

I need to strip out all incidents of the '-' pattern that are inside single quotes in a long line (leaving intact those outside of single quotes intact).

Is there a way to regex? (using it with an iterator from the language in order).

For example, starting with

"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"

I have to finish:

 "xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"

So, I am looking for a regex that can be run from the following languages, as shown

JavaScript input.replace (/ someregex / g, "")
PHP preg_replace ('/ someregex /', "", input)
Python re.sub (r'someregex ', "", input)
Ruby input.gsub (/ someregex /, "")

+2

regex

Mike berrow Oct 7 '08 at 23:13

source share

5 answers

This cannot be done with regular expressions because you need to maintain the state of whether you are in single quotes or externally, and the regular expression is essentially stateless. (Also, as I understand it, single quotes can be escaped without ending the "inner" area).

It is best to iterate the character of a string by character, keeping a logical flag, regardless of whether you are inside the area with quotation marks, and delete it this way.

+2

levik Oct 7 '08 at 23:16

source share

If bending rules is a bit allowed, this might work:

 import re p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})") txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb" print re.sub(p, r'\1-', txt)

Output:

 xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb

Regular expression:

 ( # Group 1 (?:^[^']*')? # Start of string, up till the first single quote [^']*? # Inside the single quotes, as few characters as possible (?: '[^']*' # No double dashes inside theses single quotes, jump to the next. [^']*? )*? # as few as possible ) (-{2,}) # The dashes themselves (Group 2)

If there are different delimiters for the beginning and the end, you can use something like this:

 -{2,}(?=[^'`]*`)

Edit: I realized that if the line does not contain quotation marks, it will match all double dashes in the line. One way to fix it would be to change

 (?:^[^']*')?

at the beginning

 (?:^[^']*'|(?!^))

Regular expression updated:

 ((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})

+1

Markus jarderot Oct 7 '08 at 23:41

source share

Hm. There may be a way in Python if quoted apostrophes are not specified, given that there is a constructor (?( Id / name ) yes-pattern | no-pattern ) in regular expressions, but it goes the way my head currently.

Does it help?

 def remove_double_dashes_in_apostrophes(text): return "'".join( part.replace("--", "") if (ix&1) else part for ix, part in enumerate(text.split("'")))

Seems to work for me. What he does is divide the input text into parts on the apostrophe and replace the “-” only when the part has an odd number (that is, there was an odd number of apostrophes in front of the part). Pay attention to the "odd numbers": the numbering of zeros starts from zero!

0

tzot Oct 7 '08 at 23:33

source share

You can use the following sed script, I believe:

 :again s/'\(.*\)--\(.*\)'/'\1\2'/g t again

Save this in a file (rmdashdash.sed) and execute any exec magic in the scripting language to make the following shell equivalent:

sed -f rmdotdot.sed <file containing your input

What the script does:

:again <- just a label

s/'$.*$--$.*$'/'\1\2'/g

replace, for a pattern 'followed by something followed by - followed by something followed by', only two anythings inside quotes.

t again <- return the received string back to sed again.

Note that this script converts '----' to '', since this is a sequence of two - inside quotation marks. However, '---' will be converted to '-'.

There is no school like the old school.

0

bog Oct 08 '08 at 12:28

source share

Mike berrow · Accepted Answer · 2008-10-08T03:01:39+0000

I found another way to do this from Greg Hugill's answer on Qn138522
It is based on using this regex (adapted to the content of the pattern I was looking for):

 --(?=[^\']*'([^']|'[^']*')*$)

Greg explains:

“This means that an unconvertible match (?=...) used to verify that the x character is in the quotation mark. It looks for some non-suffix characters until the next quotation mark, then looks for a sequence of either single characters or the quoted character groups to the end of the string "It depends on your assumption that quotation marks are always balanced. It is also not very effective."

Examples of using:

JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")

I checked this for Ruby and gave the desired result.

Replace patterns that are inside delimiters using a regex call

More articles: