Python - manipulate and read browser from current browser

Question

Python - manipulate and read browser from current browser

I am trying to find a method in python that allows you to read data in the currently used web browser. In fact, I am trying to load a massive dataframe of data on a locally controlled company web page and implement it as data. The problem is that the website has a rather complicated authentication authentication process that I could not get around using Selenium, using a lot of web drivers, queries, urllib and cookielib, using various user parameters. I completely backed down on this front, as I am pretty sure that the authentication process is more about what can be easily achieved using these libraries.

However, I managed to get around the required tokenization process when I quickly tested opening a new tab in the current browser that was already registered with WebBrowser . Classically, WebBrowser does not offer a reading function, which means that even though the page can be opened, the data on the page cannot be read in the pandas dataframe. This made me think that I can use Win32com, open a browser, log in, and then run the rest of the script, but again, there is no general ability to read the message for the Internet explorer, which means that I can not send information that I want to pandas. I'm at a dead end. Any ideas?

I could get the necessary authentication token scripts, but I'm sure it will take a week or two before anything happens on this front. Obviously, I would prefer to get something in average time while I wait for the actual authorization scenarios from the company.

Update: I received authentication tokens from the company, however, this requires using the python package on another server. I also do not have access, mainly due to its weirdness that I use Python in my department. Thus, the above is still applied - you need a method for reading and managing an open browser.

+8

python authentication web-scraping

Wolves Oct 10 '17 at 18:35

source share

1 answer

Alexander Chzhen · Accepted Answer · 2017-11-11T07:48:32+0000

Step by step

1) Launch a browser with Selenium.

2) Script should start waiting for a certain element that tells you that you received the required page and are logged in.

3) You can use this new browser window to enter the page manually.

4) Script discovers that you are on the necessary page and are logged in.

5) Script processes the page as you like.

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # start webdriver (opens Chrome in new window) chrome = webdriver.Chrome() # initialize waiter with maximum 300 seconds to wait. waiter = WebDriverWait(chrome , 300) # Will wait for appear of #logout element. # I assume it shows that you are logged in. wait.until(EC.presence_of_element_located(By.ID, "logout")) # Extract data etc.

It might be easier if you use your Chrome user profile. This way you might have a previous session, so you won’t need to do any login steps.

 options = webdriver.ChromeOptions() options.add_argument("user-data-dir=FULL_PATH__TO_PROFILE") chrome = webdriver.Chrome(chrome_options=options) chrome.get("https://your_page_here")

Python - manipulate and read browser from current browser

Step by step

More articles: