Reading a file with missing values in python pandas

Question

Reading a file with missing values in python pandas

I am trying to read .txt with missing values using pandas.read_csv. My data has the format:

10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301 10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686

with thousands of samples with the same point name, gps position and other readings. I am using the code:

 myData = read_csv('~/data.txt', sep=',', na_values='')

The code is incorrect because na_values does not give NaN or another indicator. The columns should be the same size, but I end up with different lengths.

I don’t know what should be entered after na_values (I tried all different things). thanks

+11

python pandas

tomasz74 20 sept '12 at 14:16

source share

3 answers

What version of pandas are you using? Interpreting an empty string as NaN is the default behavior for pandas and apparently parses empty strings in your data fragment both in v0.7.3 and in the current master without using the na_values parameter na_values all.

 In [10]: data = """\ 10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301 10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686 """ In [11]: read_csv(StringIO(data), header=None).T Out[11]: 0 1 X.1 10/08/2012 10/08/2012 X.2 12:10:10 12:10:11 X.3 name1 name2 X.4 0.81 NaN X.5 4.02 NaN X.6 50;18.5701400N NaN X.7 4;07.7693770E NaN X.8 7.92 10.87 X.9 10.5 1.4 X.10 0.0106 0.0099 X.11 4.3 9.7 X.12 0.0301 0.0686

+2

Chang she 20 sept '12 at 15:41

source share

I have the same problem, but this solution does not solve my problem, I get an error message:

ParserError: Error of data tokenization. Error C: expected 156 fields in line 10021, saw 273

PS: my CSV file contains about 300 thousand rows and 600 columns, this is something like this

P160, P230 ,,,

P14, P0, P49, P41 ,,

0

Souha Jun 23 '19 at 0:42

source share

Andy hayden · Accepted Answer · 2012-09-20T14:22:29+0000

The na_values parameter should be "list-like" (see this answer ).

The line "list like" like this:

 na_values='abc' # would transform the letters 'a', 'b' and 'c' each into 'nan' # is equivalent to na_values=['a','b','c']

Similarly:

 na_values='' # is equivalent to na_values=[] # and this is not what you want!

That means you need to use na_values=[''] .

Reading a file with missing values ​​in python pandas

More articles:

Reading a file with missing values in python pandas