Copying columns from multiple text files in Python

I have a large number of text files containing data arranged in a fixed number of rows and columns, with the columns separated by spaces. (like .csv, but using spaces as a delimiter). I want to extract a given column from each of these files and write it to a new text file.

So far I have tried:

results_combined = open('ResultsCombined.txt', 'wb') def combine_results(): for num in range(2,10): f = open("result_0."+str(num)+"_.txt", 'rb') # all the text files have similar filename styles lines = f.readlines() # read in the data no_lines = len(lines) # get the number of lines for i in range (0,no_lines): column = lines[i].strip().split(" ") results_combined.write(column[5] + " " + '\r\n') f.close() if __name__ == "__main__": combine_results() 

This creates a text file containing the data I want from separate files, but as a single column. (i.e., I managed to "stack" the columns on top of each other, and not combine them with each other as separate columns). I feel like I missed something obvious.

In another attempt, I manage to write all the individual files to a single file, but without highlighting the columns that I want.

 import glob files = [open(f) for f in glob.glob("result_*.txt")] fout = open ("ResultsCombined.txt", 'wb') for row in range(0,488): for f in files: fout.write( f.readline().strip() ) fout.write(' ') fout.write('\n') fout.close() 

Basically I want to copy column 5 from each file (it is always the same column) and write them to a single file.

+4
source share
3 answers

If you do not know the maximum number of lines in files and if files can be inserted into memory, then the following solution will work:

 import glob files = [open(f) for f in glob.glob("*.txt")] # Given file, Read the 6th column in each line def readcol5(f): return [line.split(' ')[5] for line in f] filecols = [ readcol5(f) for f in files ] maxrows = len(max(filecols, key=len)) # Given array, make sure it has maxrows number of elements. def extendmin(arr): diff = maxrows - len(arr) arr.extend([''] * diff) return arr filecols = map(extendmin, filecols) lines = zip(*filecols) lines = map(lambda x: ','.join(x), lines) lines = '\n'.join(lines) fout = open('output.csv', 'wb') fout.write(lines) fout.close() 
+2
source

Why not read all the records from each 5th column in the list and after reading in all files write all of them to the output file?

 data = [ [], # entries from first file [], # entries from second file ... ] for i in range(number_of_rows): outputline = [] for vals in data: outputline.append(vals[i]) outfile.write(" ".join(outputline)) 
+1
source

Or this option (after your second approach):

 import glob files = [open(f) for f in glob.glob("result_*.txt")] fout = open ("ResultsCombined.txt", 'w') for row in range(0,488): for f in files: fout.write(f.readline().strip().split(' ')[5]) fout.write(' ') fout.write('\n') fout.close() 

... which uses a fixed number of lines in a file, but will work for a very large number of lines, because it does not store intermediate values ​​in memory. For a moderate number of lines, I expect the solution for the first answer to be faster.

+1
source

All Articles