How to remove <div> and <br> using Cheerio js?

I have the following html that I like to parse through Cheerios.

var $ = cheerio.load('<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>This works well.</div><div><br clear="none"/></div><div>So I have been doing this for several hours. How come the space does not split? Thinking that this could be an issue.</div><div>Testing next paragraph.</div><div><br clear="none"/></div><div>Im testing with another post. This post should work.</div><div><br clear="none"/></div><h1>This is for test server.</h1></body></html>', { normalizeWhitespace: true, }); // trying to parse the html // the goals are to // 1. remove all the 'div' // 2. clean up <br clear="none"/> into <br> // 3. Have all the new 'empty' element added with 'p' var testData = $('div').map(function(i, elem) { var test = $(elem) if ($(elem).has('br')) { console.log('spaceme'); var test2 = $(elem).removeAttr('br'); } else { var test2 = $(elem).removeAttr('div').add('p'); } console.log(i +' '+ test2.html()); return test2.html() }) res.send(test2.html()) 

My ultimate goals are to try to parse html

  • delete all div
  • clear <br clear="none"/> and change to <br>
  • and finally, all empty "elements" (those sentences with "div") are deleted to be added with the sentence "p" '/ p'

I am trying to start with a smaller goal in the above code that I wrote. I tried to remove all the "divs" (this is a success), but I can not find "br. I tried for several days and did not have a head.

So, I am writing here to find some help and tips on how I can achieve my goal.

Thanks: D

+5
source share
2 answers

It's easier than it sounds, first you iterate over all the divs

 $('div').each(function() { ... 

and for each div you check if there is a tag <br>

 $(this).find('br').length 

if so, you remove the attribute

 $(this).find('br').removeAttr('clear'); 

if you did not create P with the same content

 var p = $('<p>' + $(this).html() + '</p>'); 

and then just replace the DIV with P

 $(this).replaceWith(p); 

and conclusion

 res.send($.html()); 

Together

 $('div').each(function() { if ( $(this).find('br').length ) { $(this).find('br').removeAttr('clear'); } else { var p = $('<p>' + $(this).html() + '</p>'); $(this).replaceWith(p); } }); res.send($.html()); 
+8
source

You do not want to remove the attribute you want to remove, and therefore you want to switch removeAttr to remove , for example:

 var testData = $('div').map(function(i, elem) { var test = $(elem) if ($(elem).has('br')) { console.log('spaceme'); var test2 = $(elem).remove('br'); } else { var test2 = $(elem).remove('div').add('p'); } console.log(i +' '+ test2.html()); return test2.html() }) 
0
source

Source: https://habr.com/ru/post/1214412/


All Articles