NumPy genfromtxt: use fill_missing correctly

Question

NumPy genfromtxt: use fill_missing correctly

I am trying to process data stored in CSV, which may have missing values in an unknown number of columns (up to 30). I am trying to set these missing values to '0' using the genfromtxt filling_missing argument. Here is a minimal working example for numpy 1.6.2 running in ActiveState ActivePython 2.7 32 bit in Win 7.

 import numpy text = "a,b,c,d\n1,2,3,4\n5,,7,8" a = numpy.genfromtxt('test.txt',delimiter=',',names=True) b = open('test.txt','w') b.write(text) b.close() a = numpy.genfromtxt('test.txt',delimiter=',',names=True) print "plain",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values=0) print "filling_values=0",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={1:0}) print "filling_values={1:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={0:0}) print "filling_values={0:0}",a a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) print "filling_values={None:0}",a

And the result:

 plain [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values=0 [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] filling_values={1:0} [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0)] filling_values={0:0} [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)] Traceback (most recent call last): File "C:\Users\tolivo.EE\Documents\active\eng\python\sizer\testGenfromtxt.py", line 20, in <module> a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0}) File "C:\Users\tolivo.EE\AppData\Roaming\Python\Python27\site-packages\numpy\lib\npyio.py", line 1451, in genfromtxt filling_values[key] = val TypeError: list indices must be integers, not NoneType

From the NumPy user guide, I would expect filling_values=0 and filling_values={None:0} to work, but instead they will not, and give an error accordingly. When you specify the correct column ( filling_values={1:0} ), it will work, but since I have a large number of columns of an unknown number before selection by the user, I’m looking for a way to set the filled values automatically, as the user's guide prompts in.

I assume that I can probably count the columns in advance and create a dict to pass as the fill_values value in the meantime, but is there a better way?

+8

python numpy csv genfromtxt

Thav Feb 28 '13 at 19:48

source share

1 answer

Warren weckesser · Accepted Answer · 2013-02-28T21:31:07+0000

This is not obvious from the documentation, but filling_values="0" works.

 In [19]: !cat test.txt a,b,c,d 1,2,3,4 5,,7,8 9,10,,12 In [20]: a = numpy.genfromtxt('test.txt', delimiter=',', names=True, filling_values="0") In [21]: print a [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0) (9.0, 10.0, 0.0, 12.0)]

NumPy genfromtxt: use fill_missing correctly

More articles: