JavaScript RegExp matches HTML ignored text
Is it possible to combine "the dog is really very thick" in " The <strong>dog</strong> is really <em>really</em> fat! " And add " <span class="highlight">WHAT WAS MATCHED</span> "?
I donβt mean it specifically, but as a rule, you can search for text ignoring HTML, preserving it in the end result and just adding span above around everything?
EDIT:
Given the problem of overlapping HTML tags, is it possible to combine a phrase and just add a range around each of the matching words? The problem here is that I do not want the word "dog" to correspond when it was not in the usual context, in this case "the dog is really fat."
Update:
Here is a working fiddle that does what you want. However, you will need to update htmlTagRegEx to handle matching for any HTML tag, as it just does a simple match and will not handle all cases.
http://jsfiddle.net/briguy37/JyL4J/
In addition, below is the code. Basically, it takes out the html elements one by one, then replaces the text to add the selection to match the selected selection, and then one by one discards the html elements. This is ugly, but this is the easiest way I could think of to make it work ...
function highlightInElement(elementId, text){ var elementHtml = document.getElementById(elementId).innerHTML; var tags = []; var tagLocations= []; var htmlTagRegEx = /<{1}\/{0,1}\w+>{1}/; //Strip the tags from the elementHtml and keep track of them var htmlTag; while(htmlTag = elementHtml.match(htmlTagRegEx)){ tagLocations[tagLocations.length] = elementHtml.search(htmlTagRegEx); tags[tags.length] = htmlTag; elementHtml = elementHtml.replace(htmlTag, ''); } //Search for the text in the stripped html var textLocation = elementHtml.search(text); if(textLocation){ //Add the highlight var highlightHTMLStart = '<span class="highlight">'; var highlightHTMLEnd = '</span>'; elementHtml = elementHtml.replace(text, highlightHTMLStart + text + highlightHTMLEnd); //plug back in the HTML tags var textEndLocation = textLocation + text.length; for(i=tagLocations.length-1; i>=0; i--){ var location = tagLocations[i]; if(location > textEndLocation){ location += highlightHTMLStart.length + highlightHTMLEnd.length; } else if(location > textLocation){ location += highlightHTMLStart.length; } elementHtml = elementHtml.substring(0,location) + tags[i] + elementHtml.substring(location); } } //Update the innerHTML of the element document.getElementById(elementId).innerHTML = elementHtml; } Naah ... just use the good old RegExp;)
var htmlString = "The <strong>dog</strong> is really <em>really</em> fat!"; var regexp = /<\/?\w+((\s+\w+(\s*=\s*(?:\".*?"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/gi; var result = '<span class="highlight">' + htmlString.replace(regexp, '') + '</span>'; An easy way with jQuery would be.
originalHtml = $("#div").html(); newHtml = originalHtml.replace(new RegExp(keyword + "(?![^<>]*>)", "g"), function(e){ return "<span class='highlight'>" + e + "</span>"; }); $("#div").html(newHtml); This works great for me.
Here is a working example of a regex to exclude matches within html tags as well as javascripts:
Use this regular expression in the replace () script file.
/(a)(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi You can use line replacement with this expression </?\w*> and you will get a line
If you use jQuery, you can use the text property for the element containing the text you are looking for. Given this markup:
<p id="the-text"> The <strong>dog</strong> is really <em>really</em> fat! </p> This will give: "The dog is really fat!":
$('#the-text').text(); You can search for regular expressions in this text, rather than try to do this in markup.
Without jQuery, I'm not sure that you can extract and merge text nodes from all children.