Extract all links from a string

I have a javascript variable containing the source code of an HTML page (and not the source of the current page), I need to extract all the links from this variable. Any tips on what is the best way to do this?

Is it possible to create a DOM for HTML in a variable and then go through this?

+4
source share
4 answers

I don't know if this is recommended, but it works: (JavaScript only)

var rawHTML = '<html><body><a href="foo">bar</a><a href="narf">zort</a></body></html>'; var doc = document.createElement("html"); doc.innerHTML = rawHTML; var links = doc.getElementsByTagName("a") var urls = []; for (var i=0; i<links.length; i++) { urls.push(links[i].getAttribute("href")); } alert(urls) 
+6
source

If you use jQuery, you can very easily believe:

 var doc = $(rawHTML); var links = $('a', doc); 

http://docs.jquery.com/Core/jQuery#htmlownerDocument

+5
source

This is useful if you need to replace links ...

 var linkReg = /(<[Aa]\s(.*)<\/[Aa]>)/g; var linksInText = text.match(linkReg); 
+3
source

If you use Firefox YES, YOU CAN! It is called DOMParser , check it out:

 DOMParser is mainly useful for applications and extensions based on Mozilla platform. While it available to web pages, it not part of any standard and level of support in other browsers is unknown. 
+1
source

All Articles