If you think you want to remove all text from any HTML tags ... only the real DOM is going to cut it.
function removeAllTextNodes(node) { if (node.nodeType === 3) { node.parentNode.removeChild(node); } else if (node.childNodes) { for (var i = node.childNodes.length; i--;) { removeAllTextNodes(node.childNodes[i]); } } }
This, unlike textContent and innerHTML , will preserve the entire existing structure of the elements and delete only the text.
If you really have a string and you use client-side JavaScript in the browser, and the string represents part of the contents of the documents (and not the entire document - that is, you will not find DTD, <html> , <head> , or <body> elements inside), then you can parse it by simply pasting it into an element:
var container = document.createElement("div"); container.innerHTML = htmlString; removeAllTextNodes(container); return container.innerHTML;
Otherwise, you probably need an HTML parser for JavaScript. Regular expressions, as noted, are great for parsing HTML.
Ryan
source share