I am trying to use the following code, which works fine, but it does not allow JavaScript to run first, which means that I do not get the desired HTML code from the web page.
I looked at DryScrape , but as far as I can see, it does not support publishing, as you can see in the auto_login() function, the same goes for PyQt4 .
Does this website have 4 bits of JSON "listings"? which form the page when loading / rendering; if I look at the source code, it doesn’t look beautiful, and I can’t easily find the material in it, however, if I “Check Element” on the page, the HTML looked perfect, and then I can easily view it using BeautifulSoup .
I know that I can use Selenium , but this is not what I want to do mainly because it starts in the background, I can use PhantomJS or PyVirtualDisplay for this, but this will only be a last resort.
import requests from bs4 import BeautifulSoup HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'en-US,en;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'} SESSION = requests.session() RESPONSE = requests.Response email = " mail@mail.com " pass = "password123 def auto_login(): global RESPONSE url = "https://website.com/log.php" payload = { "log":email, "pwd":pass, "finish":"https://website.com/listed/public/gen1/", } RESPONSE = SESSION.post(url, data=payload, headers=HEADERS, verify=False) def process_html(): PROCESSED_HTML = BeautifulSoup(RESPONSE.content, 'html.parser') return PROCESSED_HTML def main(): auto_login() PROCESSED_HTML = get_html() if __name__ == "__main__": main()
How to easily render a JavaScript web page using my script (modified), preferably with PyQt4 ( DryScrape by default will not install correctly on my Windows 10, Python 2.7) without using Selenium .
Any ideas would be appreciated.