Remove HTML Tags in Javascript with Regex

I am trying to remove all html tags from a string in Javascript. Heres what I have ... I canโ€™t understand why it doesnโ€™t work ... does anyone know what I am doing wrong?

<script type="text/javascript"> var regex = "/<(.|\n)*?>/"; var body = "<p>test</p>"; var result = body.replace(regex, ""); alert(result); </script> 

Thank you so much!

+101
javascript regex
Sep 30 '09 at 18:31
source share
11 answers

Try this by noting that the HTML grammar is too complex for regular expressions to be 100% correct:

 var regex = /(<([^>]+)>)/ig , body = "<p>test</p>" , result = body.replace(regex, ""); console.log(result); 

If you want to use a library like jQuery , you can simply do this:

 console.log($('<p>test</p>').text()); 
+220
Sep 30 '09 at 18:36
source share

This is an old question, but I stumbled upon it and thought that I would use the method that I used:

 var body = '<div id="anid">some <a href="link">text</a></div> and some more text'; var temp = document.createElement("div"); temp.innerHTML = body; var sanitized = temp.textContent || temp.innerText; 

sanitized will now contain: "some text and some more text"

Simple, jQuery is not required, and it should not allow you even in more complex cases :)

James

+31
Oct 17
source share

It worked for me.

  var regex = /(&nbsp;|<([^>]+)>)/ig , body = tt , result = body.replace(regex, ""); alert(result); 
+8
Sep 17 '14 at 8:39
source share

This is how TextAngular (WYSISYG Editor) does it. I also found that this is the most consistent answer that does not contain REGEX.

 @license textAngular Author : Austin Anderson License : 2013 MIT Version 1.5.16 // turn html into pure text that shows visiblity function stripHtmlToText(html) { var tmp = document.createElement("DIV"); tmp.innerHTML = html; var res = tmp.textContent || tmp.innerText || ''; res.replace('\u200B', ''); // zero width space res = res.trim(); return res; } 
+5
Mar 29 '17 at 21:24
source share

my simple JavaScript library called FuncJS has a function called strip_tags () that performs this task for you without requiring any regular expressions.

For example, let's say that you want to remove tags from a sentence - with this function you can do it just like this:

 strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!"); 

This will lead to the appearance of "This line contains many tags!".

For a better understanding, read the documentation on GitHub FuncJS .

In addition, if you want, please provide some feedback through the form. I would be very helpful!

+2
Nov 23 '12 at 23:22
source share

you can use the powerful String management library which is undrescore.string.js

 _('a <a href="#">link</a>').stripTags() 

=> 'link'

 _('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags() 

=> 'linkalert ("hello world!")'

Remember to import this library as follows:

  <script src="underscore.js" type="text/javascript"></script> <script src="underscore.string.js" type="text/javascript"></script> <script type="text/javascript"> _.mixin(_.str.exports())</script> 
+2
Mar 28 '13 at 16:13
source share

For the correct HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

0
01 Oct '09 at 0:02
source share

The selected answer does not always guarantee that the HTML will be deleted, since you can still build an invalid HTML string through it by creating a string as shown below.

  "<<h1>h1>foo<<//</h1>h1/>" 

This entry ensures that the description collects a set of tags for you and will result in:

  "<h1>foo</h1>" 

Additionally, the jquery text function will skip text not surrounded by tags.

Here's a function using jQuery, but should be more robust in both cases:

 var stripHTML = function(s) { var lastString; do { s = $('<div>').html(lastString = s).text(); } while(lastString !== s) return s; }; 
0
Apr 04 '13 at 15:31
source share
 <html> <head> <script type="text/javascript"> function striptag(){ var html = /(<([^>]+)>)/gi; for (i=0; i < arguments.length; i++) arguments[i].value=arguments[i].value.replace(html, "") } </script> </head> <body> <form name="myform"> <textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br> <input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)"> </form> </body> </html> 
0
Aug 02 '14 at 8:20
source share

The way I do this is almost single-line.

The function creates a Range object and then creates a DocumentFragment in Range with a string as child content.

Then it captures the text of the fragment, removes all the "invisible" characters / characters of zero width, and cuts off any leading / trailing space.

I understand that this question is old, I just thought that my solution was unique, and I wanted to share it. :)

 function getTextFromString(htmlString) { return document .createRange() // Creates a fragment and turns the supplied string into HTML nodes .createContextualFragment(htmlString) // Gets the text from the fragment .textContent // Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters .replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '') // Trims off any extra space on either end of the string .trim(); } var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>'); alert(cleanString); 
0
Jul 16 '19 at 4:48
source share
-one
Mar 17 '17 at 15:28
source share



All Articles