I am trying to process data stored in CSV, which may have missing values โโin an unknown number of columns (up to 30). I am trying to set these missing values โโto '0' using the genfromtxt filling_missing argument. Here is a minimal working example for numpy 1.6.2 running in ActiveState ActivePython 2.7 32 bit in Win 7.
import numpy text = "a,b,c,d\n1,2,3,4\n5,,7,8" a = numpy.genfromtxt('test.txt',delimiter=',',names=True) b = open('test.txt','w') b.write(text) b.close() a = numpy.genfromtxt('test.txt',delimiter=',',names=True) print "plain",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values=0) print "filling_values=0",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={1:0}) print "filling_values={1:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={0:0}) print "filling_values={0:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) print "filling_values={None:0}",a
And the result:
plain [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values=0 [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values={1:0} [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0)] filling_values={0:0} [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] Traceback (most recent call last): File "C:\Users\tolivo.EE\Documents\active\eng\python\sizer\testGenfromtxt.py", line 20, in <module> a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) File "C:\Users\tolivo.EE\AppData\Roaming\Python\Python27\site-packages\numpy\lib\npyio.py", line 1451, in genfromtxt filling_values[key] = val TypeError: list indices must be integers, not NoneType
From the NumPy user guide, I would expect filling_values=0 and filling_values={None:0} to work, but instead they will not, and give an error accordingly. When you specify the correct column ( filling_values={1:0} ), it will work, but since I have a large number of columns of an unknown number before selection by the user, Iโm looking for a way to set the filled values โโautomatically, as the user's guide prompts in.
I assume that I can probably count the columns in advance and create a dict to pass as the fill_values โโvalue in the meantime, but is there a better way?