Python selenium, find out when download is complete?

I used selenium to start the download. After the download is complete, certain steps need to be taken, is there an easy way to find out when the download is complete? (I am using the FireFox driver)

+12
python selenium
source share
6 answers

There is no built-in way to selenium to wait for the download to complete.


The general idea here is to wait until the file appears in your Downloads directory.

This can be achieved by looping through the file availability check again:

  • Check and wait until the file reads it

Or using things like watchdog to control the directory:

  • How to see the change catalog?
  • Monitoring file / directory contents?
+15
source share

I ran into this problem recently. I downloaded several files at the same time, and I had to create a timeout if the download failed.

The code checks the file names in a download directory every second and exits when they are complete or if it takes more than 20 seconds to complete them. The returned load time was used to verify the success of the downloads or the timeout.

 import time import os def download_wait(path_to_downloads): seconds = 0 dl_wait = True while dl_wait and seconds < 20: time.sleep(1) dl_wait = False for fname in os.listdir(path_to_downloads): if fname.endswith('.crdownload'): dl_wait = True seconds += 1 return seconds 

I believe this only works with Chrome files, as they end with the .crdownload extension. There may be a similar way of checking in other browsers.

Edit: I recently changed the way this function is used for times when .crdownload does not appear as an extension. Essentially, it just waits for the correct number of files.

 def download_wait(directory, timeout, nfiles=None): """ Wait for downloads to finish with a specified timeout. Args ---- directory : str The path to the folder where the files will be downloaded. timeout : int How many seconds to wait until timing out. nfiles : int, defaults to None If provided, also wait for the expected number of files. """ seconds = 0 dl_wait = True while dl_wait and seconds < timeout: time.sleep(1) dl_wait = False files = os.listdir(directory) if nfiles and len(files) != nfiles: dl_wait = True for fname in files: if fname.endswith('.crdownload'): dl_wait = True seconds += 1 return seconds 
+7
source share

I know that it is too late for an answer, although I would like to share a hack for future readers.

You can create a thread, say thread1 from the main thread, and start the download here. Now create another thread, say thread2, and there, let it wait for thread1 to finish using the join () method. Now you can continue execution after the download is complete.

However, make sure that you do not start the download using selenium, instead extract the link using selenium and use the request module to download.

Download using the query module

For example:

 def downloadit(): #download code here def after_dwn(): dwn_thread.join() #waits till thread1 has completed executing #next chunk of code after download, goes here dwn_thread = threading.Thread(target=downloadit) dwn_thread.start() metadata_thread = threading.Thread(target=after_dwn) metadata_thread.start() 
+3
source share

In Chrome, files that have not finished downloading have the extension .crdownload . If you install your download directory , you can wait until the file you want no longer has this extension. Basically, this is not very different from waiting for a file to exist (as suggested by alecxe ), but at least you can track the progress this way.

0
source share
 x1=0 while x1==0: count=0 li = os.listdir("directorypath") for x1 in li: if x1.endswith(".crdownload"): count = count+1 if count==0: x1=1 else: x1=0 

This works if you are trying to check if a set of files (more than one) has completed.

0
source share

As already mentioned, there is no way to check if the download is complete. So, here is a helper function that does the job for Firefox and Chrome. One trick is to clear the temp download folder before starting a new download. Also, use your own cross-platform path.

 from pathlib import Path def is_download_finished(temp_folder): firefox_temp_file = sorted(Path(temp_folder).glob('*.part')) chrome_temp_file = sorted(Path(temp_folder).glob('*.crdownload')) downloaded_files = sorted(Path(temp_folder).glob('*.*')) if (len(firefox_temp_file) == 0) and \ (len(chrome_temp_file) == 0) and \ (len(downloaded_files) >= 1): return True else: return False 
0
source share

All Articles