Coldfusion ReReplace "&" but not htmlspecialchars

I need to replace everything with & in a line like this:

 Übung 1: Ü & Ä 

or in html

 Übung 1: Ü & Ä 

As you can see htmlspecialchars in the line (but & not displayed as & ), so I need to exclude them from my replacement. I am not very familiar with regular expressions. All I need is an expression that does the following:

Find & , which performs either (space), or does not follow some characters, except for a space ending in ; . then replace it with & .

I tried something like this:

 <cfset data = ReReplace(data, "&[ ]|[^(?*^( ));]", "&amp;", "ALL") /> 

but replaces every char $ amp; ... ^^ '

Sorry, I really don't understand what regular expression is.

+4
source share
3 answers

Problem with existing attempt

The reason for the unsuccessful template &[ ]|[^(?*^( ));] Is because you have | but there is no bounding box - this means that you are replacing &[ ] OR [^(?*^( ));] - and this last will fit most things - you are also a misunderstanding of how character classes work.

Inside [ .. ] (the character class) there are a few simple rules:

  • if it starts with ^ , it is denied, otherwise ^ is a literal.
  • if there is a hyphen, it is considered as a range (for example, az or 1-5)
  • if there is a backslash, it either marks the abbreviated class (for example, \w ), or escapes the next character (inside the char class, this is only required for [ ] ^ - \ ).
  • you use only one character (taking into account any qualifiers); there is no order / sequence in the class, and duplicates of the same character are ignored.

In addition, you do not need to put a space inside the character class - the literal space works fine (unless you are in the comment mode with free space, which should be explicitly included).

Hope this helps you understand what is going wrong?

Regarding the actual solution to your problem ...

Decision

To map an ampersand that does not run an HTML object, you can use:

 &(?![az][a-z0-9]+;|#(?:\d+|x[\dA-F]+);) 

That is, an ampersand followed by a negative result for any of:

  • letter, then letter or number, semicolon - that is, a reference to a named object

  • a hash, then either a number or x, followed by a hexadecimal number, and finally a semicolon - that is, a reference to a numerical object.

To use this in CFML, replace & with &amp; will be:

 <cfset data = rereplaceNoCase( data , '&(?![az][a-z0-9]+;|##(?:\d+|x[\dA-F]+);)' , '&amp;' , 'all' ) /> 
+8
source

I think it would be easier to just replace all occurrences of & with &amp; and then replace incorrectly replaced again:

 <cfset data = ReReplace(ReReplace(data, "&", "&amp;", "ALL"), "&amp;([^;&]*;)", "&\1", "ALL") /> 

I have not tested this in ColdFusion (since I don’t know how to do it), but it should work, because in JavaScript the expression itself works:

 var s = "I we&nt out on 1 se&123;p 2012 and& it was be&tter & than 15 jan 2012" console.log(s.replace(/&/g, '&amp;').replace(/&amp;([^;&]*;)/g, '&$1')); //"I we&amp;nt out on 1 se&123;p 2012 and&amp; it was be&amp;tter &amp; than 15 jan 2012" 

So, I assume that the regex will also do its trick in CF.

+3
source

Another option you have is to not use REGEX at all. For the example line you provided, you simply bind to replace html ampersand ("&") without affecting the html objects. This can only be done with REPLACE.

Remember that when using entities there will be no spaces around the ampersand symbol, where there is usually a leading and trailing space to convert an ampersand to an HTML object. REPLACE will find each case of "and" and update without affecting any of the "& Uuml" lines (for example, there is no leading and trailing space).

 <cfset html = "&Uuml;bung 1: &Uuml; & &Auml;"> <cfset parsedHtml = REPLACE(html," & ", " &amp; ","All")> 
0
source

All Articles