Python os.walk extension on FTP server

How can I make os.walk move the directory tree of the FTP database (located on the remote server)? Now a way to structure the code (comments provided):

 import fnmatch, os, ftplib def find(pattern, startdir=os.curdir): #find function taking variables for both desired file and the starting directory for (thisDir, subsHere, filesHere) in os.walk(startdir): #each of the variables change as the directory tree is walked for name in subsHere + filesHere: #going through all of the files and subdirectories if fnmatch.fnmatch(name, pattern): #if the name of one of the files or subs is the same as the inputted name fullpath = os.path.join(thisDir, name) #fullpath equals the concatenation of the directory and the name yield fullpath #return fullpath but anew each time def findlist(pattern, startdir = os.curdir, dosort=False): matches = list(find(pattern, startdir)) #find with arguments pattern and startdir put into a list data structure if dosort: matches.sort() #isn't dosort automatically False? Is this statement any different from the same thing but with a line in between return matches #def ftp( #specifying where to search. if __name__ == '__main__': import sys namepattern, startdir = sys.argv[1], sys.argv[2] for name in find(namepattern, startdir): print (name) 

I think I need to define a new function (i.e. def ftp() ) to add this functionality to the code above. However, I am afraid that the os.walk function will only process the directory trees of the computer on which the code is os.walk by default.

Is there a way to extend the capabilities of os.walk to be able to navigate the remote directory tree (via FTP)?

+7
source share
3 answers

All you need to do is use the python ftplib module. Since os.walk() based on a search algorithm in Breadth mode, you need to find directories and file names at each iteration, and then continue recursive movements from the first directory. I implemented this algorithm about 2 years ago for use as FTPwalker , which is the optimal package for moving extremely large directory trees through FTP.

 from os import path as ospath class FTPWalk: """ This class is contain corresponding functions for traversing the FTP servers using BFS algorithm. """ def __init__(self, connection): self.connection = connection def listdir(self, _path): """ return files and directory names within a path (directory) """ file_list, dirs, nondirs = [], [], [] try: self.connection.cwd(_path) except Exception as exp: print ("the current path is : ", self.connection.pwd(), exp.__str__(),_path) return [], [] else: self.connection.retrlines('LIST', lambda x: file_list.append(x.split())) for info in file_list: ls_type, name = info[0], info[-1] if ls_type.startswith('d'): dirs.append(name) else: nondirs.append(name) return dirs, nondirs def walk(self, path='/'): """ Walk through FTP server directory tree, based on a BFS algorithm. """ dirs, nondirs = self.listdir(path) yield path, dirs, nondirs for name in dirs: path = ospath.join(path, name) yield from self.walk(path) # In python2 use: # for path, dirs, nondirs in self.walk(path): # yield path, dirs, nondirs self.connection.cwd('..') path = ospath.dirname(path) 

Now to use this class, you can simply create a connection object using the ftplib module and pass the object to the FTPWalk object and just loop over the walk() function:

 In [2]: from test import FTPWalk In [3]: import ftplib In [4]: connection = ftplib.FTP("ftp.uniprot.org") In [5]: connection.login() Out[5]: '230 Login successful.' In [6]: ftpwalk = FTPWalk(connection) In [7]: for i in ftpwalk.walk(): print(i) ...: ('/', ['pub'], []) ('/pub', ['databases'], ['robots.txt']) ('/pub/databases', ['uniprot'], []) ('/pub/databases/uniprot', ['current_release', 'previous_releases'], ['LICENSE', 'current_release/README', 'current_release/knowledgebase/complete', 'previous_releases/', 'current_release/relnotes.txt', 'current_release/uniref']) ('/pub/databases/uniprot/current_release', ['decoy', 'knowledgebase', 'rdf', 'uniparc', 'uniref'], ['README', 'RELEASE.metalink', 'changes.html', 'news.html', 'relnotes.txt']) ... ... ... 
+2
source

I'm going to suggest that this is what you want ... although in fact I have no idea

 ssh = paramiko.SSHClient() ssh.connect(server, username=username, password=password) ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command("locate my_file.txt") print ssh_stdout 

this requires the remote server to have the mlocate package `sudo apt-get install mlocate; sudo updatedb ();

0
source

I needed a function like os.walk on FTP, and where it is not there, I thought it would be useful to write it, for future links you can find the latest version here.

by the way, here is the code that will do this:

 def FTP_Walker(FTPpath,localpath): os.chdir(localpath) current_loc = os.getcwd() for item in ftp.nlst(FTPpath): if not is_file(item): yield from FTP_Walker(item,current_loc) elif is_file(item): yield(item) current_loc = localpath else: print('this is a item that i could not process') os.chdir(localpath) return def is_file(filename): current = ftp.pwd() try: ftp.cwd(filename) except Exception as e : ftp.cwd(current) return True ftp.cwd(current) return False 

how to use:

Connect to your host first:

 host_address = "my host address" user_name = "my username" password = "my password" ftp = FTP(host_address) ftp.login(user=user_name,passwd=password) 

Now you can call the function as follows:

 ftpwalk = FTP_Walker("FTP root path","path to local") # I'm not using path to local yet but in future versions I will improve it. so you can just path an '/' to it 

and then print and download the files, you can do something like this:

 for item in ftpwalk: ftp.retrbinary("RETR "+item, open(os.path.join(current_loc,item.split('/')[-1]),"wb").write) #it is downloading the file print(item) # it will print the file address 

(I will write more functions for this in the near future, so if you need any specific things or you have ideas that may be useful to users, I will be glad to hear that)

0
source

All Articles