I have a folder with 5 text files in it related to different sites -
The header is formatted as follows:
Rockspring_18_SW.417712.WRFc36.ET.2000-2050.txt Rockspring_18_SW.417712.WRFc36.RAIN.2000-2050.txt WICA.399347.WRFc36.ET.2000-2050.txt WICA.399347.WRFc36.RAIN.2000-2050.txt
therefore, basically the file name follows the format, (site name). (site number). (WRFc36). (some variable). (2000-2050.txt
Each of these text files has a similar format to it without a title bar: Year Month Day Value (consisting of ~ 18500 lines in each text file)
I want Python to search for similar file names (with the site name and site number) and select the first or third columns of data from one of the files and insert them into a new txt file. I also want to copy and paste 4 columns from each variable for the site (rain, et, etc.) and paste them in a specific order in a new file.
I know how to capture data using the csv module (and define a new dialect for the spatial divider) from ALL files and print to a new text file, but I'm not sure how to automate the creation of a new file for each name / site number and make sure that mine variables are displayed in the correct order -
The output I want to use is one text file (not 5) for each site with the following format (year, month, day, variable1, variable2, variable3, variable4, variable5) for ~ 18500 lines ...
I'm sure I'm looking at something very simple here ... it looks like it will be rather rudimentary ... but ... any help would be greatly appreciated!
Update
========
I updated the code to reflect the comments below.
http://codepad.org/3mQEM75e
from the collection import defaultdict import glob import csv
#Create dictionary of lists-- [A] = [Afilename1, Afilename2, Afilename3...] # [B] = [Bfilename1, Bfilename2, Bfilename3...] def get_site_files(): sites = defaultdict(list) #to start, I have a bunch of files in this format --- #"site name(unique)"."site num(unique)"."WRFc36"."Variable(5 for each site name)"."2000-2050" for fname in glob.glob("*.txt"): #split name at every instance of "." parts = fname.split(".") #check to make sure i only use the proper files-- having 6 parts to name and having WRFc36 as 3rd part if len(parts)==6 and parts[2]=='WRFc36': #Make sure site name is the full unique identifier, the first and second "parts" sites[parts[0]+"."+parts[1]].append(fname) return sites #hardcode the variables for method 2, below Var=["TAVE","RAIN","SMOIS_INST","ET","SFROFF"] def main(): for site_name, files in get_site_files().iteritems(): print "Working on *****"+site_name+"*****" ####Method 1- I'd like to not hardcode in my variables (as in method 2), so I can use this script in other applications. for filename in files: reader = csv.reader(open(filename, "rb")) WriteFile = csv.writer(open("XX_"+site_name+"_combined.txt","wb")) for row in reader: row = reader.next() ####Method 2 works (mostly), but skips a LOT of random lines of first file, and doesn't utilize the functionality built into my dictionary of lists... ## reader0 = csv.reader(open(site_name+".WRFc36."+Var[0]+".2000-2050.txt", "rb")) #I'd like to copy ALL columns from the first file ## reader1 = csv.reader(open(site_name+".WRFc36."+Var[1]+".2000-2050.txt", "rb")) # and just the fourth column from all the rest of the files ## reader2 = csv.reader(open(site_name+".WRFc36."+Var[2]+".2000-2050.txt", "rb")) # (the columns 1-3 are the same for all files) ## reader3 = csv.reader(open(site_name+".WRFc36."+Var[3]+".2000-2050.txt", "rb")) ## reader4 = csv.reader(open(site_name+".WRFc36."+Var[4]+".2000-2050.txt", "rb")) ## WriteFile = csv.writer(open("XX_"+site_name+"_COMBINED.txt", "wb")) #creates new command to write a text file ## ## for row in reader0: ## row = reader0.next() ## row1 = reader1.next() ## row2 = reader2.next() ## row3 = reader3.next() ## row4 = reader4.next() ## WriteFile.writerow(row + row1 + row2 + row3 + row4) ## print "***finished with site***" if __name__=="__main__": main()