Why carriage return line feed (CR LF) is not correctly processed by TPerlRegEx if indicated as a replacement

I am trying to replace spaces with a new line using the TPerlRegEx class.

with RegExp do begin Subject:=Memo1.Lines.Text; RegEx:=' '; Replacement:='\r\n'; ReplaceAll; Memo1.Lines.Text:=Subject; end; 

The problem is that it treats the replacement \ r \ n as a literal.

+7
source share
3 answers

Use #13#10

 program Project29; {$APPTYPE CONSOLE} uses SysUtils, PerlRegEx; var RegEx: TPerlRegEx; function CStyleEscapes(const InputText:string):string; var i,j: Integer; begin SetLength(Result, Length(InputText)); i := 1; // input cursor j := 1; // output cursor while i <= Length(InputText) do if InputText[i] = '\' then if i = Length(InputText) then begin // Eroneous quotation... Result[j] := '\'; Inc(i); Inc(j); end else begin case InputText[i+1] of 'r', 'R': Result[j] := #13; 'n', 'N': Result[j] := #10; 't', 'T': Result[j] := #9; '\': begin Result[j] := '\'; Inc(j); Result[j] := '\'; end; else begin Result[j] := '\'; Inc(j); Result[j] := InputText[i+1]; end; end; Inc(i,2); Inc(j); end else begin Result[j] := InputText[i]; Inc(i); Inc(j); end; SetLength(Result, j-1); end; begin RegEx := TPerlRegEx.Create; try RegEx.RegEx := ' '; RegEx.Replacement := CStyleEscapes('\t\t\t');; RegEx.Subject := 'FirstLine SecondLine'; RegEx.ReplaceAll; WriteLn(RegEx.Subject); ReadLn; finally RegEx.Free; end; end. 
+8
source

I really wanted to know why this is not as expected.

Processing \ in the Replacement text is performed in TPerlRegEx.ComputeReplacement . If you look at the code, you will see that there are no sequences that give carriage return and line characters. In fact, ComputeReplacement is all backlinks.

The regular expression matching phase is processed by the PCRE code. However, the replacement phase is pure Pascal code. And it's easy enough to check the code to see what it does. And it does not do what you think and expect from it.

The conclusion is that you cannot specify the characters you want using escape sequences. I think you will need to develop your own rules for escaping non-printable characters and apply these rules in the OnReplace event OnReplace .

+6
source

Change, because today I learned something new.

At the same time, I ran into the same problem as the question, and made the wrong conclusion that TRegEx does not perform TRegEx expansion at all .

The correct conclusion should be that TRegEx does not perform C-style reverse slash expansion in replacement string parameters, and I should investigate if it does string parameters in pattern .

I knew support for character exit mechanisms depends on the development tool .

For example, C, C #, Java, Perl, PHP, Ruby, bash and many others do the reverse flush. But since the Delphi compiler (since it is not a C-style compiler), no.
It will extend the Pascal-style escape style (e.g. #13#10 or ^M^J ) in CRLF.

So, I did this research today (thanks to David for pointing me to my original mistake), and came up with two examples ( one in Delphi and one in C # ) that has a function that basically does this:

  • show the result of matching the known CRLF string pattern and the pattern containing the string
  • show space replacement with string

Then the sample function is called:

  • a string that is the backslash of the escape string \ r \ n in the source code, so it can be parsed by the compiler
  • the string that the character composes, so it becomes the backslash of the escape string \ r \ n of the string so that it can be parsed using the RegEx mechanism

From the output in both examples, you will see that:

  • Delphi compiler does not parse string \ r \ n
  • C # compiler parses string \ r \ n
  • The RegEx engine in Delphi and C # parses the \ r \ n pattern string at runtime ( RegEx documentation )
  • The RegEx mechanism in both Delphi and C # does not parse the replace \ r \ n string at run time ( RegEx documentation )

The stil recommendation is:

So either use Pascal-style escape sequences, or use the C-style backslash extension function , for example Cosmin wrote.

As a side note: when using any extension function, you should keep in mind that this will change the meaning of the text. Delphi users may not expect C-style string expansion

+1
source

All Articles