NumPy genfromtxt: use fill_missing correctly

I am trying to process data stored in CSV, which may have missing values โ€‹โ€‹in an unknown number of columns (up to 30). I am trying to set these missing values โ€‹โ€‹to '0' using the genfromtxt filling_missing argument. Here is a minimal working example for numpy 1.6.2 running in ActiveState ActivePython 2.7 32 bit in Win 7.

 import numpy text = "a,b,c,d\n1,2,3,4\n5,,7,8" a = numpy.genfromtxt('test.txt',delimiter=',',names=True) b = open('test.txt','w') b.write(text) b.close() a = numpy.genfromtxt('test.txt',delimiter=',',names=True) print "plain",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values=0) print "filling_values=0",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={1:0}) print "filling_values={1:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={0:0}) print "filling_values={0:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) print "filling_values={None:0}",a 

And the result:

 plain [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values=0 [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values={1:0} [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0)] filling_values={0:0} [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] Traceback (most recent call last): File "C:\Users\tolivo.EE\Documents\active\eng\python\sizer\testGenfromtxt.py", line 20, in <module> a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) File "C:\Users\tolivo.EE\AppData\Roaming\Python\Python27\site-packages\numpy\lib\npyio.py", line 1451, in genfromtxt filling_values[key] = val TypeError: list indices must be integers, not NoneType 

From the NumPy user guide, I would expect filling_values=0 and filling_values={None:0} to work, but instead they will not, and give an error accordingly. When you specify the correct column ( filling_values={1:0} ), it will work, but since I have a large number of columns of an unknown number before selection by the user, Iโ€™m looking for a way to set the filled values โ€‹โ€‹automatically, as the user's guide prompts in.

I assume that I can probably count the columns in advance and create a dict to pass as the fill_values โ€‹โ€‹value in the meantime, but is there a better way?

+8
python numpy csv genfromtxt
source share
1 answer

This is not obvious from the documentation, but filling_values="0" works.

 In [19]: !cat test.txt a,b,c,d 1,2,3,4 5,,7,8 9,10,,12 In [20]: a = numpy.genfromtxt('test.txt', delimiter=',', names=True, filling_values="0") In [21]: print a [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0) (9.0, 10.0, 0.0, 12.0)] 
+8
source share

All Articles