JavaScript: How to get text from all descendants of an element, not counting scripts?

Question

JavaScript: How to get text from all descendants of an element, not counting scripts?

My current project involves collecting textual content from an element and all its descendants based on the provided selector.

For example, when the #content selector is #content , it runs against this HTML:

 <div id="content"> <p>This is some text.</p> <script type="text/javascript"> var test = true; </script> <p>This is some more text.</p> </div>

my script will return (after a little clearing of the spaces):

This is the text. var test = true; This is another text.

However, I need to ignore the text nodes that occur inside the <script> elements.

This is a snippet of my current code (technically, it matches one or more provided selectors):

 // get text content of all matching elements for (x = 0; x < selectors.length; x++) { // 'selectors' is an array of CSS selectors from which to gather text content matches = Sizzle(selectors[x], document); for (y = 0; y < matches.length; y++) { match = matches[y]; if (match.innerText) { // IE content += match.innerText + ' '; } else if (match.textContent) { // other browsers content += match.textContent + ' '; } } }

This is a little too simplistic, as it simply returns all the text nodes inside the element (and its descendants) that matches the provided selector. The solution I'm looking for will return all text nodes except those that are in the <script> elements. It doesn't have to be particularly high-performance, but I need it to end up being compatible with multiple browsers.

I assume that I need to somehow sort through all the child elements of the element that correspond to the selector, and copy all text nodes other than those inside the <script> elements; this does not look like a way to identify JavaScript after it has already flipped to a string accumulated from all text nodes.

I cannot use jQuery (for performance / bandwidth reasons), although you may have noticed that I use its Sizzle selection mechanism, so the jQuery selection logic is available.

Thanks in advance for your help!

+6

javascript string dom text textnode

Bungle Mar 28 '10 at 5:51

source share

2 answers

EDIT

Ok first let me say that I'm not too popular with Sizzle on his loneliness, jsut in the libraries that use him ... That said ..

If I had to do this, I would do something like:

 var selectors = new Array('#main-content', '#side-bar'); function findText(selectors) { var rText = ''; sNodes = typeof selectors = 'array' ? $(selectors.join(',')) : $(selectors); for(var i = 0; i < sNodes.length; i++) { var nodes = $(':not(script)', sNodes[i]); for(var j=0; j < nodes.length; j++) { if(nodes[j].nodeType != 1 && node[j].childNodes.length) { /* recursion - this would work in jQ not sure if * Sizzle takes a node as a selector you may need * to tweak. */ rText += findText(node[j]); } } } return rText; }

I have not tested anything, but it should give you an idea. Hope someone else will work in a big direction :-)

Could you just take the parent node and check the nodeName in your loop ... for example:

 if(match.parentNode.nodeName.toLowerCase() != 'script' && match.nodeName.toLowerCase() != 'script' ) { match = matches[y]; if (match.innerText) { // IE content += match.innerText + ' '; } else if (match.textContent) { // other browsers content += match.textContent + ' '; } }

thecourse jquery supports not() syntax in selectors, so can you just do $(':not(script)') ?

+2

prodigitalson Mar 28 '10 at 6:40

source share

bobince · Accepted Answer · 2010-03-28T10:06:16+0000

 function getTextContentExceptScript(element) { var text= []; for (var i= 0, n= element.childNodes.length; i<n; i++) { var child= element.childNodes[i]; if (child.nodeType===1 && child.tagName.toLowerCase()!=='script') text.push(getTextContentExceptScript(child)); else if (child.nodeType===3) text.push(child.data); } return text.join(''); }

Or, if you are allowed to modify the DOM to remove <script> elements (which usually do not have noticeable side effects), faster:

 var scripts= element.getElementsByTagName('script'); while (scripts.length!==0) scripts[0].parentNode.removeChild(scripts[0]); return 'textContent' in element? element.textContent : element.innerText;

JavaScript: How to get text from all descendants of an element, not counting scripts?

More articles: