Add file name as last column of CSV file

Question

Add file name as last column of CSV file

I have a Python script that modifies a CSV file to add the file name as the last column:

import sys import glob for filename in glob.glob(sys.argv[1]): file = open(filename) data = [line.rstrip() + "," + filename for line in file] file.close() file = open(filename, "w") file.write("\n".join(data)) file.close()

Unfortunately, it also adds the file name to the header of the (first) line of the file. I would like to have an “ID” instead of a string instead. Can anyone suggest how I could do this?

+4

python linux csv

Henry Levine Apr 18 '11 at 8:29

source share

6 answers

Reto aebersold · Answer 1 · 2011-04-18T08:43:25+0000

Check out the official csv module .

Mark longair · Answer 2 · 2011-04-18T08:57:27+0000

Here are a few minor points about your current code:

It is a bad idea to use file as a variable name, as this obscures the built-in type.
You can close file objects automatically using the with syntax.
Don't want to add an extra column to the title bar, called something like Filename , rather than just omitting the column in the first row?
If your file names have commas (or, what is still possible, new lines), you need to make sure that the file name is specified - just adding it will not work.

The latter consideration would appeal to me instead to use the csv module, which will deal with quoting and fuzzy for you. For example, you can try something like the following code:

 import glob import csv import sys for filename in glob.glob(sys.argv[1]): data = [] with open(filename) as finput: for i, row in enumerate(csv.reader(finput)): to_append = "Filename" if i == 0 else filename data.append(row+[to_append]) with open(filename,'wb') as foutput: writer = csv.writer(foutput) for row in data: writer.writerow(row)

This may indicate the data a little differently than your input file, so you can play with the quote options for csv.reader and csv.writer described in the documentation for the csv module .

As another point, you may have good reasons for accepting glob as a parameter, not just files on the command line, but this is a bit surprising - you will have to name your script as ./whatever.py '*.csv' and not just ./whatever.py *.csv . Instead, you can simply do:

 for filename in sys.argv[1:]:

... and let the shell expand your globe before the script knows anything about it.

Last: the current approach you take is a bit dangerous because if something fails while accessing the same file name, you will lose data. The standard way to avoid this is to write to a temporary file instead, and if it was successful, rename the temporary file on top of the original. So you can rewrite all of this as:

 import csv import sys import tempfile import shutil for filename in sys.argv[1:]: tmp = tempfile.NamedTemporaryFile(delete=False) with open(filename) as finput: with open(tmp.name,'wb') as ftmp: writer = csv.writer(ftmp) for i, row in enumerate(csv.reader(finput)): to_append = "Filename" if i == 0 else filename writer.writerow(row+[to_append]) shutil.move(tmp.name,filename)

Don · Answer 3 · 2011-04-18T08:47:14+0000

You can try:

 data = [file.readline().rstrip() + ",id"] data += [line.rstrip() + "," + filename for line in file]

Blair · Answer 4 · 2011-04-18T08:56:46+0000

Use the CSV module that comes with Python.

 import csv import sys def process_file(filename): # Read the contents of the file into a list of lines. f = open(filename, 'r') contents = f.readlines() f.close() # Use a CSV reader to parse the contents. reader = csv.reader(contents) # Open the output and create a CSV writer for it. f = open(filename, 'wb') writer = csv.writer(f) # Process the header. header = reader.next() header.append('ID') writer.writerow(header) # Process each row of the body. for row in reader: row.append(filename) writer.writerow(row) # Close the file and we're done. f.close() # Run the function on all command-line arguments. Note that this does no # checking for things such as file existence or permissions. map(process_file, sys.argv[1:])

You can run it as follows:

 blair@blair-eeepc :~$ python csv_add_filename.py file1.csv file2.csv

lecodesportif · Answer 5 · 2011-04-18T08:57:15+0000

You can try to change the code, but it is recommended to use the csv module . This should give you the result you want:

 import sys import glob import csv filename = glob.glob(sys.argv[1])[0] yourfile = csv.reader(open(filename, 'rw')) csv_output=[] for row in yourfile: if len(csv_output) != 0: # skip the header row.append(filename) csv_output.append(row) yourfile = csv.writer(open(filename,'w'),delimiter=',') yourfile.writerows(csv_output)

kurumi · Answer 6 · 2011-04-18T09:15:57+0000

you can use fileinput for in-place editing

 import sys import glob import fileinput for filename in glob.glob(sys.argv[1]): for line in fileinput.FileInput(filename,inplace=1) : if fileinput.lineno()==1: print line.rstrip() + " ID" else print line.rstrip() + "," + filename

Add file name as last column of CSV file

More articles: