Python BeautifulSoup is equivalent to lxml make_links_absolute

So lxml has a very convenient function: make_links_absolute:

doc = lxml.html.fromstring(some_html_page) doc.make_links_absolute(url_for_some_html_page) 

and all links in the doc are absolute. Is there a simple equivalent in BeautifulSoup or just need to pass it through urlparse and normalize it:

 soup = BeautifulSoup(some_html_page) for tag in soup.findAll('a', href=True): url_data = urlparse(tag['href']) if url_data[0] == "": full_url = url_for_some_html_page + test_url 
+8
python lxml beautifulsoup
source share
1 answer

In my answer to What is an easy way to retrieve a list of URLs in a web page using python? I considered this, by the way, as part of the extraction phase; you can easily write a method to do this on the soup, and not just extract it.

 import urlparse def make_links_absolute(soup, url): for tag in soup.findAll('a', href=True): tag['href'] = urlparse.urljoin(url, tag['href']) 
+14
source share

Source: https://habr.com/ru/post/650232/


All Articles