Prevent ftplib file loading?

We have an installation of an ftp system for monitoring / loading from remote ftp servers that are not under our control. The script connects to the remote ftp and captures the file names of the files on the server, then we check to see if something has already been downloaded. If it has not been uploaded, we will upload the file and add it to the list.

Recently, we encountered a problem when someone on the remote ftp side would copy in a massive single file (> 1 GB), then the script wakes up, sees a new file and starts downloading the file that is being copied to.

What is the best way to test this? I was thinking about capturing the file size, waiting for a few seconds, checking the file size again and see if it increased, if it was not then we downloaded it. But since time is a concern, we cannot wait a few seconds for each set of files and see if the file size has increased.

What would be the best way to do this, now everything is done using pythons ftplib, how can we do this except using the above method.

Once again, let me repeat this, we have 0 control over remote ftp sites.

Thanks.

Update1:

I thought that if I try to rename it ... since we have full permissions on ftp, if the file is being downloaded, will the rename command fail?

We have no real options ... are we?

UPDATE2: Well, here's something interesting, some of the ftps tested seem to automatically allocate space after the start of the transfer.

eg. If I transfer the 200mb file to the ftp server. Although the transfer is active, if I connect to the ftp server and do the size at boot time. It shows 200 mb for size. Despite the fact that the file is only 10% full.

The permissions also seem like a randomly configured FTP server that comes with IIS sets permissions AFTER the file completes. Although some of the other old ftp servers install it as soon as you send the file.

: '(

+4
source share
4 answers

"Damn torpedoes! Full speed ahead!"

Just download the file. If this is a large file, then after the download is completed, wait as long as it is reasonable for your scenario, and continue the download from the moment it stops. Repeat until more files are downloaded.

+5
source

You cannot know when a copy of the OS will be executed. It may slow down or wait.

For absolute certainty, you really need two files.

  • Massive file.
  • And a small trigger file.

They can ruin the massive file they need. But when they touch the trigger file, you load both.


If you cannot get a trigger, you must balance the time it takes to poll and the time it takes to load.

Do it.

  • Get listing. Check the timestamps.

  • Check the sizes and previous file size. If the size is not even closed, it is copied right now. Wait; in this step until the size is close to the previous size.

  • Until you are done:

    a. Get the file.

    b. Get the listing AGAIN. Check the size of the new listing, previous list and your file. If they agree: everything is ready. If they do not agree: the file was modified at boot time; you are not finished.

0
source

As you say, you have 0 control over the servers, and you cannot force your clients to send trigger files, as S. Lott suggests, you have to deal with an imperfect solution and the risk of incomplete file transfers, perhaps waiting for some time and Compare file sizes before and after.

You can try renaming as suggested, but since you have the 0 control, you cannot be sure that the ftp server admin (or their successor) does not change platforms or ftp servers or limit your permissions.

Unfortunately.

0
source

If you are dealing with several files, you can immediately get a list of all sizes, wait ten seconds and see which ones are the same. Whatever should be safe to download.

0
source

All Articles