Regex replace, but only between two templates

Ok, I have a multi-line string that I'm trying to execute.

Each line may or may not be part of a large block of quoted text. Example:

This line is not quoted. This part of the line is not quoted "but this is." This one is not quoted either. "This entire line is quoted" Not quoted. "This line is quoted and so is this one and so is this one." This is not quoted "but this is and so is this." 

I need a RegEx replacement that will wrap strings with a hard shell, i.e. replace "\ r \ n" with a space, but only between curly quotes.

Here's what it should look like after replacement:

 This line is not quoted. This part of the line is not quoted "but this is." This one is not quoted either. "This entire line is quoted" Not quoted. "This line is quoted and so is this one and so is this one." This is not quoted "but this is and so is this." 

(Notice how the last two lines were in the input line of several lines).

Limitations

  • Ideally, replace one Regex call
  • Using the .NET RegEx Library
  • Quotation marks always display italic quotation marks of the beginning and end, rather than simple double ticks ("), which should make this a little easier.

Important limitation

This is not direct .NET code, I populate the "searchfor / replacewith" string table, which is then called through RegEx.Replace. I have no way to add custom code like Match Evaluators, iterate over captured groups, etc.

The current answer is still something like:

 r.Replace("(?<=")\r\n(?=")", " ") 

Obviously, I'm not closed yet.

The same logic can be applied, say, to the color coding of block comments in the programming code - anything inside the block comment is not processed in the same way as material outside the comments. (The code is a bit more complicated, since comment delimiters at the end / end of a block can also legitimately exist in a literal string, a problem I don't have to deal with here.)

+4
source share
5 answers

Assuming all curly quotes are properly balanced, this regex should do what you want:

 @"[\r\n]+(?=[^""]*")" 

[\r\n]+ will match one or more line separators of any type - Unix (\ n), DOS (\ r \ n) or an older Mac (\ r). Then the views affirm that there is a close quote ahead, and that there is no open quote here and there. Then your replacement text may be a simple space character.

+4
source

NB: for testing regular expressions I use http://gskinner.com/RegExr/ , which is very useful.

I do not think that you can write one expression that replaces the number of new lines undefined. However, you can write an expression to replace one or more, and either run it many times, or write it to deal with the maximum number of new lines that you will have in one cited section.

First, you need single-line mode so that your expression matches the entire input line instead of line by line. Put this at the beginning of the expression to include it:

 (?s) 

Then you want the look-behind expression to match the starting carriage:

 (?<=") 

And look ahead to match the end quote:

 (?=") 

Now an expression to match some text, then a new line, then some text:

 ([^"\r]*)\r?([^"\r]*) 

Note that there are two capture groups for bits of text around a new line, so you can include this text in a replace expression. This will match text that has only one new line in quotation marks. To expand this to two lines of newline, simply add another optional newline and optional following text:

 (?s)(?<=")([^"\r]*)\r?([^"\r]*)\r?([^"\r]*)(?=") 

You can expand it to fit as many lines as you think. Not perfect, but maybe enough. Or, if you can run the expression repeatedly in the text, just replace it one at a time.

Leaving your expression like this:

 r.Replace("(?s)(?<=")([^"\r]*)\r?([^"\r]*)", "$1 $2") 

(This is not entirely correct, as it will add a space after the text, even if the second group does not match ... but this is the beginning)

+1
source

So, you need to find a line starting with an introductory quote, followed by a line that does not contain a trailing quote or any \ r \ n characters, followed by a series of one or more \ n characters, grab everything except terminal characters, and replace the entire match with the captured part.

- MarkusQ

0
source

I think the easiest way would be to match the quoted sections with "(?s:.*?)" And use the MatchEvaluator to remove any new lines. MatchEvaluator code can be as simple as

 Replace(@"\s+", " "); 

You could, of course, refine this to match only the quoted sections that actually contain newline characters and replace only the newline lines inside these sections instead of all spaces, but this is probably not worth the effort.

0
source

You cannot do what you want within the limits that you have described.

Evidence:

  • Your fixed note table will make a fixed number of calls to replace (call it n)
  • Each substitution can only eliminate a fixed number of line breaks (call this number m).

therefore

  • A recorded block with m * n + 1 line breaks will not be properly considered.

You need to increase the power of your installation (for example, through more complex replacements, recursive replacements, an indefinite repeat flag or ...?) Or accept the fact that this task cannot be performed by your engine.

- MarkusQ

0
source

All Articles