How to remove <div> and <br> using Cheerio js?
I have the following html that I like to parse through Cheerios.
var $ = cheerio.load('<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>This works well.</div><div><br clear="none"/></div><div>So I have been doing this for several hours. How come the space does not split? Thinking that this could be an issue.</div><div>Testing next paragraph.</div><div><br clear="none"/></div><div>Im testing with another post. This post should work.</div><div><br clear="none"/></div><h1>This is for test server.</h1></body></html>', { normalizeWhitespace: true, }); // trying to parse the html // the goals are to // 1. remove all the 'div' // 2. clean up <br clear="none"/> into <br> // 3. Have all the new 'empty' element added with 'p' var testData = $('div').map(function(i, elem) { var test = $(elem) if ($(elem).has('br')) { console.log('spaceme'); var test2 = $(elem).removeAttr('br'); } else { var test2 = $(elem).removeAttr('div').add('p'); } console.log(i +' '+ test2.html()); return test2.html() }) res.send(test2.html()) My ultimate goals are to try to parse html
- delete all div
- clear
<br clear="none"/>and change to<br> - and finally, all empty "elements" (those sentences with "div") are deleted to be added with the sentence "p" '/ p'
I am trying to start with a smaller goal in the above code that I wrote. I tried to remove all the "divs" (this is a success), but I can not find "br. I tried for several days and did not have a head.
So, I am writing here to find some help and tips on how I can achieve my goal.
Thanks: D
It's easier than it sounds, first you iterate over all the divs
$('div').each(function() { ... and for each div you check if there is a tag <br>
$(this).find('br').length if so, you remove the attribute
$(this).find('br').removeAttr('clear'); if you did not create P with the same content
var p = $('<p>' + $(this).html() + '</p>'); and then just replace the DIV with P
$(this).replaceWith(p); and conclusion
res.send($.html()); Together
$('div').each(function() { if ( $(this).find('br').length ) { $(this).find('br').removeAttr('clear'); } else { var p = $('<p>' + $(this).html() + '</p>'); $(this).replaceWith(p); } }); res.send($.html()); You do not want to remove the attribute you want to remove, and therefore you want to switch removeAttr to remove , for example:
var testData = $('div').map(function(i, elem) { var test = $(elem) if ($(elem).has('br')) { console.log('spaceme'); var test2 = $(elem).remove('br'); } else { var test2 = $(elem).remove('div').add('p'); } console.log(i +' '+ test2.html()); return test2.html() })