How to open all files of a certain type in Python and process them?

I am trying to figure out how to get python to go through a directory full of csv files, process each of the files and spit out a text file with a trimmed list of values.

In this example, I iterate through a CSV with many different types of columns, but all I really want is a first name, last name, and keyword. I have a folder full of these csvs with different columns (except that they all share the first name, last name and keyword somewhere in csv). What is the best way to open this folder, go through each csv file, and then spit it all out as your own csv file for a text list only, as in the example below.

import csv reader = csv.reader(open("keywords.csv")) rownum = 0 headnum = 0 F = open('compiled.txt','w') for row in reader: if rownum == 0: header = row; for col in row: if header[headnum]=='Keyword': keywordnum=headnum; elif header[headnum]=='First Name': firstnamenum=headnum; elif header[headnum]=='Last Name': lastnamenum=headnum; headnum +=1 else: currentrow=row print(currentrow[keywordnum] + '\n' + currentrow[firstnamenum] + '\n' + currentrow[lastnamenum]) F.write(currentrow[keywordnum] + '\n') rownum +=1 
+4
source share
5 answers

The best way is probably to use the globbing ability of the shell or, alternatively, the python glob module.

Shell (Linux, Unix)

Shell:

  python myapp.py folder / *. csv

myapp.py:

 import sys for filename in sys.argv[1:]: with open(filename) as f: # do something with f 

Windows (or without a shell).

 import glob for filename in glob.glob("folder/*.csv"): with open(filename) as f: # do something with f 

Note: Python 2.5 requires from __future__ import with_statement

+8
source

“Get all CSV files” the part of the question answered several times (including OP), but there is no “get the right name columns” yet: csv.DictReader makes it trivial: the “process one CSV file” loop becomes simple:

 reader = csv.DictReader(open(thecsvfilename)) for row in reader: print('\n'.join(row['Keyword'], row['First Name'], row['Last Name'])) F.write(row['Keyword'] + '\n') 
+4
source

A few suggestions:

  • You can save the header indexes for Keyword, First Name and Last Name on the map instead of using separate variables. This would facilitate the modification of the script later.

  • You can use the list index () function instead of looping over the headers, for example:

      if rownum == 0:
         for header in ('Keyword', 'First Name', 'Last Name'):
             header_index [header] = row.index (header)
    
  • You can use the glob module to capture file names, but gs is probably right that switching shells is the best way to do this.

  • It might be better to use the csv module to write the file; I think he manages to escape, so he will probably be more reliable.

+1
source

I think the best way to handle a bunch of files in a directory is with os.walk (documented in Python os module docs here .

Here is the answer that I wrote to another Python question that includes working tested Python code for using os.walk to open a bunch of files. This version also visits all subdirectories, but it would be easy to change it to just stay in the same directory.

Replace strings in files using Python

+1
source

And I again answered my question ... I imported the os and glob modules to fill the path.

0
source

All Articles