Pythonic way to import data from multiple files into an array

I'm relatively new to Python and wondering what is the best way to import data from multiple files into one array. I have quite a few text files containing 50 rows from two data columns (with column separators), such as:

Length=10.txt: 1, 10 2, 30 3, 50 #etc END OF FILE 

-

 Length=20.txt 1, 50.7 2, 90.9 3, 10.3 #etc END OF FILE 

Let's say I have 10 text files to import and import into a variable called data.

I would like to create one three-dimensional array containing all the data. This way, I can easily create and process data by referencing data[:,:,n] , where n refers to the index of the text file.

I think I would do this to have an array of shapes (50, 2, 10), but I don't know how best to use python to create it. I thought about using a loop to import each text file as a 2D array, and then stacked it to create a 2D array, although I could not find the appropriate commands for this (I looked at vstack and column_stack in numpy, but these don't seem to add additional measurement).

So far I have written an import code:

  file_list = glob.glob(source_dir + '/*.TXT') #Get folder path containing text files for file_path in file_list: data = np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18) 

But the problem with this code is that I can only process data in a for loop.

What I really want is an array of all the data imported from text files.

Any help would be greatly appreciated!

+7
source share
5 answers

"But the problem with this code is that I can only process data in a for loop."

Assuming your code works:

 # Get folder path containing text files file_list = glob.glob(source_dir + '/*.TXT') data = [] for file_path in file_list: data.append( np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18)) # now you can access it outside the "for loop..." for d in data: print d 
+6
source

Are you looking for an array that is [txt column1, txt column2, filename]?

 file_list = glob.glob(source_dir + '/*.TXT') #Get folder path containing text files for num,file_path in enumerate(file_list): data = np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18) data = np.vstack((data.T,np.ones(data.shape[0])*num)).T if num==0: Output=data else: Output=np.vstack((Output,data)) 

Alternative if you do not want to transfer twice.

  data = np.vstack((data,(np.ones(data.shape[0])*num).reshape(-1,1))) 
+1
source

IF all data has the same form and then simply added to the list.

 all_data = [] 

and in your loop:

 all_data.append(data) 

finally you have

 asarray(all_data) 

which is an array of shape (10,50,2) (transpose if you want). If the shapes do not match, then this does not work, but numpy cannot handle strings of different shapes. Then you may need another loop that creates arrays of the largest shape and copies your data.

+1
source

rude but fast

 listFiles=["1.txt","2.txt", ... ,"xxx.txt"] allData=[] for file in listFiles: lines = open(file,'r').readlines() filedata = {} filedata['name'] = file filedata['rawLines'] = lines col1Vals = [] col2Vals = [] mapValues = {} for line in lines: values = line.split(',') col1Vals.append(values[0]) col2Vals.append(values[1]) mapValues[values[0]] = values[1] filedata['col1'] = col1Vals filedata['col2'] = col2Vals filedata['map'] = mapValues allData.append(filedata) 


if you want to get a list of files from a specific directory, see os.walk

Since it’s not clear how you need the data, I have shown many ways to store it.

allData - list of dictionaries

to get the second column of data from the 3rd file, which you could do allData[2]['col2']

if you need the name of the third file alldata[2]['name']

+1
source

maybe you can do like this:

 file_list = glob.glob(source_dir + '/*.TXT') # Get folder path containing text files data = [np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18) for file_path in file_list] 
0
source

All Articles