Converting a string list to float32 efficiently

I have a 3000x300 matrix file (float). when I read and convert to float, I get float64, which is used by default in python. I tried numpy and map () to convert it to float32 (), but they both seem very inefficient.

my code is:

x = open(readFrom, 'r').readlines() y = [[float(i) for i in s.split()] for s in x] 

time taken: 0:00:00.996000

numpy implementation:

 x = open(readFrom, 'r').readlines() y = [[np.float32(i) for i in s.split()] for s in x] 

time taken: 0:00:06.093000

map()

 x = open(readFrom, 'r').readlines() y = [map(np.float32, s.split()) for s in x] 

time taken: 0:00:05.474000

How can I effectively convert to float32?

Thanks.

Update:

numpy.loadtxt() or numpy.genfromtxt() does not work (giving a memory error) for a huge file. I posted a question related to this, and the approach I proposed works well for a huge matrix file (50,000x5000). here is the question

+4
source share
1 answer

If memory is a problem, and if you know the size of the field ahead of time, you probably won't want to read the entire file first. Something like this is probably more appropriate:

 #allocate memory (np.empty would work too and be marginally faster, # but probably not worth mentioning). a=np.zeros((3000,300),dtype=np.float32) with open(filename) as f: for i,line in enumerate(f): a[i,:]=map(np.float32,line.split()) 

from a couple of quick (and unexpected) tests on my machine, it seems that map might not even be needed:

 a=np.zeros((3000,300),dtype=np.float32) with open(filename) as f: for i,line in enumerate(f): a[i,:]=line.split() 

It may not be the fastest, but, of course, it will be the most efficient way to work with memory.

Some tests:

 import numpy as np def func1(): #No map -- And pretty speedy :-). a=np.zeros((3000,300),dtype=np.float32) with open('junk.txt') as f: for i,line in enumerate(f): a[i,:]=line.split() def func2(): a=np.zeros((3000,300),dtype=np.float32) with open('junk.txt') as f: for i,line in enumerate(f): a[i,:]=map(np.float32,line.split()) def func3(): a=np.zeros((3000,300),dtype=np.float32) with open('junk.txt') as f: for i,line in enumerate(f): a[i,:]=map(float,line.split()) import timeit print timeit.timeit('func1()',setup='from __main__ import func1',number=3) #1.36s print timeit.timeit('func2()',setup='from __main__ import func2',number=3) #11.53s print timeit.timeit('func3()',setup='from __main__ import func3',number=3) #1.72s 
+2
source

All Articles