So lxml has a very convenient function: make_links_absolute:
doc = lxml.html.fromstring(some_html_page) doc.make_links_absolute(url_for_some_html_page)
and all links in the doc are absolute. Is there a simple equivalent in BeautifulSoup or just need to pass it through urlparse and normalize it:
soup = BeautifulSoup(some_html_page) for tag in soup.findAll('a', href=True): url_data = urlparse(tag['href']) if url_data[0] == "": full_url = url_for_some_html_page + test_url
python lxml beautifulsoup
bigredbob
source share