Javascript regex: remove text between HTML tags

Question

Javascript regex: remove text between HTML tags

I want to remove text that is between any HTML tags:

example:

<div> <h1>Title</h1> </div>

my var result should be:

 <div> <h1></h1> </div>

0

javascript regex

badaboum Jan 6 '14 at 17:34

source share

5 answers

Ryan · Answer 1 · 2014-01-06T18:20:53+0000

If you think you want to remove all text from any HTML tags ... only the real DOM is going to cut it.

 function removeAllTextNodes(node) { if (node.nodeType === 3) { node.parentNode.removeChild(node); } else if (node.childNodes) { for (var i = node.childNodes.length; i--;) { removeAllTextNodes(node.childNodes[i]); } } }

This, unlike textContent and innerHTML , will preserve the entire existing structure of the elements and delete only the text.

If you really have a string and you use client-side JavaScript in the browser, and the string represents part of the contents of the documents (and not the entire document - that is, you will not find DTD, <html> , <head> , or <body> elements inside), then you can parse it by simply pasting it into an element:

 var container = document.createElement("div"); container.innerHTML = htmlString; removeAllTextNodes(container); return container.innerHTML;

Otherwise, you probably need an HTML parser for JavaScript. Regular expressions, as noted, are great for parsing HTML.

Sterling archer · Answer 2 · 2014-01-06T17:53:03+0000

VANILLA JS FOR RESCUE

 var x = document.getElementsByTagName("h1"); for (var i=0; i<x.length; i++) { x[i].innerHTML = ""; }

Just insert any tag you need, and wallah does not need a regex or a 90kb library.

16807 · Answer 3 · 2014-01-06T17:43:14+0000

Javascript can already accomplish this with built-in functions so as to conceptually outperform regex

 <div> <h1 id="foo">Title</h1> </div> <script> document.getElementById("foo").textContent = "" </script>

iConnor · Answer 4 · 2014-01-06T17:56:50+0000

You might want to do something like this:

 var elements = document.getElementsByTagName('*'); for(var i = 0; i < elements.length; i++) { var element = elements[i]; if(element.children.length === 0) { elements[i].textContent = ''; } }

it

Search all items
Cycles through them
Removes any text content.

Docs:

You can also make it reusable, for example,

 var removeAllText = function() { var elements = document.getElementsByTagName('*'); for(var i = 0; i < elements.length; i++) { var element = elements[i]; if(element.children.length === 0) { elements[i].textContent = ''; } } }

Then, when you want, you can do it

 removeAllText();

benlaird · Answer 5 · 2014-01-06T17:40:54+0000

Do not use regex. Use something like loadXMLDoc () to parse the DOM and print tags instead of trying to remove values from tags.

Javascript regex: remove text between HTML tags

More articles: