Remove HTML Tags in Javascript with Regex

Question

Remove HTML Tags in Javascript with Regex

I am trying to remove all html tags from a string in Javascript. Heres what I have ... I can’t understand why it doesn’t work ... does anyone know what I am doing wrong?

<script type="text/javascript"> var regex = "/<(.|\n)*?>/"; var body = "<p>test</p>"; var result = body.replace(regex, ""); alert(result); </script>

Thank you so much!

+101

javascript regex

Gabe Sep 30 '09 at 18:31

source share

11 answers

This is an old question, but I stumbled upon it and thought that I would use the method that I used:

 var body = '<div id="anid">some <a href="link">text</a></div> and some more text'; var temp = document.createElement("div"); temp.innerHTML = body; var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, jQuery is not required, and it should not allow you even in more complex cases :)

James

+31

jsdw Oct 17

source share

It worked for me.

  var regex = /(&nbsp;|<([^>]+)>)/ig , body = tt , result = body.replace(regex, ""); alert(result);

+8

d689p Sep 17 '14 at 8:39

source share

This is how TextAngular (WYSISYG Editor) does it. I also found that this is the most consistent answer that does not contain REGEX.

 @license textAngular Author : Austin Anderson License : 2013 MIT Version 1.5.16 // turn html into pure text that shows visiblity function stripHtmlToText(html) { var tmp = document.createElement("DIV"); tmp.innerHTML = html; var res = tmp.textContent || tmp.innerText || ''; res.replace('\u200B', ''); // zero width space res = res.trim(); return res; }

+5

Rentering.com Mar 29 '17 at 21:24

source share

my simple JavaScript library called FuncJS has a function called strip_tags () that performs this task for you without requiring any regular expressions.

For example, let's say that you want to remove tags from a sentence - with this function you can do it just like this:

 strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");

This will lead to the appearance of "This line contains many tags!".

For a better understanding, read the documentation on GitHub FuncJS .

In addition, if you want, please provide some feedback through the form. I would be very helpful!

+2

Sharikul Islam Nov 23 '12 at 23:22

source share

you can use the powerful String management library which is undrescore.string.js

 _('a <a href="#">link</a>').stripTags()

=> 'link'

 _('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()

=> 'linkalert ("hello world!")'

Remember to import this library as follows:

  <script src="underscore.js" type="text/javascript"></script> <script src="underscore.string.js" type="text/javascript"></script> <script type="text/javascript"> _.mixin(_.str.exports())</script>

+2

Abdennour TOUMI Mar 28 '13 at 16:13

source share

For the correct HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

0

Mike Samuel 01 Oct '09 at 0:02

source share

The selected answer does not always guarantee that the HTML will be deleted, since you can still build an invalid HTML string through it by creating a string as shown below.

  "<<h1>h1>foo<<//</h1>h1/>"

This entry ensures that the description collects a set of tags for you and will result in:

  "<h1>foo</h1>"

Additionally, the jquery text function will skip text not surrounded by tags.

Here's a function using jQuery, but should be more robust in both cases:

 var stripHTML = function(s) { var lastString; do { s = $('<div>').html(lastString = s).text(); } while(lastString !== s) return s; };

0

Rick Moynihan Apr 04 '13 at 15:31

source share

 <html> <head> <script type="text/javascript"> function striptag(){ var html = /(<([^>]+)>)/gi; for (i=0; i < arguments.length; i++) arguments[i].value=arguments[i].value.replace(html, "") } </script> </head> <body> <form name="myform"> <textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br> <input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)"> </form> </body> </html>

0

Surya R Praveen Aug 02 '14 at 8:20

source share

The way I do this is almost single-line.

The function creates a Range object and then creates a DocumentFragment in Range with a string as child content.

Then it captures the text of the fragment, removes all the "invisible" characters / characters of zero width, and cuts off any leading / trailing space.

I understand that this question is old, I just thought that my solution was unique, and I wanted to share it. :)

 function getTextFromString(htmlString) { return document .createRange() // Creates a fragment and turns the supplied string into HTML nodes .createContextualFragment(htmlString) // Gets the text from the fragment .textContent // Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters .replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '') // Trims off any extra space on either end of the string .trim(); } var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>'); alert(cleanString);

0

ElijahFowler Jul 16 '19 at 4:48

source share

Like others, regex will not work. Take a minute to read an article on why you cannot and should not try to parse html with a regular expression, what you do when you're trying to remove html from the original string.

-one

Cole Mar 17 '17 at 15:28

source share

karim79 · Accepted Answer · 2009-09-30 18:36

Try this by noting that the HTML grammar is too complex for regular expressions to be 100% correct:

 var regex = /(<([^>]+)>)/ig , body = "<p>test</p>" , result = body.replace(regex, ""); console.log(result);

If you want to use a library like jQuery , you can simply do this:

 console.log($('<p>test</p>').text());

Remove HTML Tags in Javascript with Regex

More articles: