Match Match Prevention

I am trying to match all parts of a string literal containing between quotation marks .

(?<=[\"]).*?(?=(?<=[^\\])[\"]{1})

The above working expression, which does this, with one exception, of course, will correspond to all parts of the string literal, where there is a quotation mark to the left and to the right of it, regardless of quotation pairs.

For example (an asterisk indicates a matching character):

Hello "my" name is "Andy", nice to meet you.`
       ** ********* ****

The literal string part "name is" here is mapped simply because it has a quote character on either side of it. This is not true for what we are looking for. Perfect result:

Hello "my" name is "Andy", nice to meet you.`
       **           ****

In the full understanding of the fact that this is possible, and perhaps it should be done by writing a state mechanism - my question will be - in terms of regular expressions - if possible , how can I prevent comparison of string literal matching previously matched with forward search?

+4
source share
4 answers

Prelude

Ruby, , , , . , , , (, javascript) , Ruby. Perl, Sublime Text .


...

: ! !


, ... . , - .

, :

, (group_contents), (?<group_name>group_contents). , .

, , \g<group_name_or_number>. :

(?<three_letter_word>\b\w{3}\b) \g<three_letter_word>

xyz abc.

, {0} . , , . :

(?<even>[02468]){0}7\g<even>8\<even>9\<even>0

7x8y9z0, x, y z .

, lookbehinds . , (, java), . , (?<=x*).

\K . , \K , , . , (?<=x*)y x*\Ky.

, .


-, "" ( № 3).
  • escaped_quote

- ", (\). , (aka \\= ).

, \\{2}* (aka two back slashes zero or more times - 2 * n). , - \\\\{2}* (2 * n + 1).

, . , , , . \\\" , \\", . , lookbehind : (?<!\\)\\\\{2}*

escaped_quote "" :

(?<escaped_quote>(?<!\\)\\\\{2}*"){0}
  1. non_quoting

, , - . , .

, lookahead escaped_quote. , \ escaped_quote, .

(?<non_quoting>(?:\g<escaped_quote>|(?!\g<escaped_quote>)[^"])*){0}
  1. balanced_quotes

"", , - , . , :

(?<balanced_quotes>\g<non_quoting>|(?:\g<non_quoting>"\g<non_quoting>){2}+){0}


.

. . , . (?:^|")

: , . , , \K lookbehind. , - . , , , : (?:^|"|)

non_quoting, ( # 4) :

(?:^|"|)\g<non_quoting>"\K

non_quoting:

(?:^|"|)\g<non_quoting>"\K\g<non_quoting>

, , balanced_quotes :

(?:^|"|)\g<non_quoting>"\K\g<non_quoting>(?="\g<balanced_quotes>$)


!

"" , :

(?<escaped_quote>(?<!\\)\\\\{2}*"){0}(?<non_quoting>(?:\g<escaped_quote>|(?!\g<escaped_quote>)[^"])*){0}(?<balanced_quotes>\g<non_quoting>|(?:\g<non_quoting>"\g<non_quoting>){2}+){0}(?:^|"|)\g<non_quoting>"\K\g<non_quoting>(?="\g<balanced_quotes>$)


, regex, , . , , \K.

, , .

+5

.NET lookbehind, :

(?<!(.|\n)\G")(?<!(^|[^\\])(\\\\)*\\")(?:(?<=")(?:(?:\\\\|\\"|[^"])+?)(?=")|(?<=")(?="))

.NET, Java ( ).

DEMO

, , Java, :

(?<!(.|\n)\G")(?<!(^|[^\\])(\\\\){0,20}\\")(?:(?<=")(?:(?:\\\\|\\"|[^"])+?)(?=")|(?<=")(?="))

, ( ) Java, lookbehind, , .

:

Regex lookbehind, , / .

(?<!(.|\n)\G") - , . lookbehind :

  • . \n (, DOTALL Java, .),
  • \G - , ", ,
  • " - ,

((?<!(^|[^\\])(\\\\){0,20}\\") , . :

  • (^|[^\\]) - , ( , , \\\\\\\\"xxx"),
  • (\\\\){0,20} - ( 20) ( , quatation),
  • \\ - ,

lookbehind , , (+, *, ?, {2,4}). Java ? min max lenght. , 20 (\\\\){0,20} , , , ( ) 20 . , . , , , .

- . ( ) : (?<=")(?=")), , - (?<!(.|\n)\G") , , (, """). :

(?<=")(?:(?:\\\\|\\"|[^"])+?)(?=") , . :

  • (?<=") - lookbehind ,
  • (?:(?:\\\\|\\"|[^"])+?) - , ,
  • (?=") - ,

(?:\\\\|\\"|[^"])+?)* :

  • \\\\ - \", , \\", \",
  • \\" - , [^"], \" ;
  • [^"] ,

Java.

Regex RegexPlanet - Java

+1

, .

(?<!.\G")(?<="|\\\\")(?<![^\\]\\")((?>\\.|[^"])*?)(?=")

Regex101

0

. puts, , .

str = 'Hello "my" name is "Andy", nice to meet "Sally"'

r = /
    (       # start capture group 1
    .*?     # match >= 0 characters lazily 
    (?<=\") # match " in a positive lookbehind
    (.*?)   # match >= 0 characters lazily in capture group 2
    (?=\")  # match " in a positive lookahead
    .       # match one character
    )       # close capture group 1
    /x      # extended mode

a = []
s = str.dup
loop do
  break a unless s =~ r
  puts
  puts "$1 = |#{$1}|"
  puts "$2 = |#{$2}|"
  a << $2
  puts "a  = #{a}"
  s = s[$1.size..-1]
  puts "s  = |#{s}|"
end

$1 = |Hello "my"|
$2 = |my|
a  = ["my"]
s  = | name is "Andy", nice to meet "Sally"|

$1 = | name is "Andy"|
$2 = |Andy|
a  = ["my", "Andy"]
s  = |, nice to meet "Sally"|

$1 = |, nice to meet "Sally"|
$2 = |Sally|
a  = ["my", "Andy", "Sally"]
s  = ||
  #=> ["my", "Andy", "Sally"] 

, . :

["my", " name is ", "Andy", ", nice to meet ", "Sally"]

, $1 , , , . , ', nice to meet "Sally"' , 'Sally' .

I do not understand what is appropriate .? @ndd explained that it matches "in s, which makes perfect sense, given that the return lines are zero width.

0
source

All Articles