I would like to create an array from a list, but save NaNs and infs

I am parsing a data file that contains space-delimited text that was generated from C ++. Some of the driving computations will overflow, overflow, or generate NaN. It seems that the lines "1. # INF00" and "1. # IND00" are not digested by numpy.array (), returning the error "invalid literal for float ()". I tried to make a replacement like this:

line = line.replace('1.#INF00','inf') line = line.replace('1.#IND00','ind') vals = line.split(' ') myarray = array(vals) 

but, alas, to no avail. I also tried "nan" and "NaN". Is there some line that I could replace, which float () will interpret in nan, inf, etc.? Perhaps I need to escape in some quotes?

As an aside, can you tell me how matplotlib will handle inf? By default, the solution would be to change them to NaN when they are detected. I found that it was demonstrated that matplotlib would handle them gracefully, leaving spaces in the data. What would be an acceptable cure for my "inf" and "ind"

+4
source share
2 answers

float('nan') should return NaN, and float('inf') should return infinity. At least how they work on my interpreter (CPython 2.7). It seems that on some platforms (especially on Windows) on some platforms (especially on Windows) everything has changed, but I doubt that you are using such an old version of Python.

Perhaps the problem is related to numpy, but in this case you can try:

 line = line.replace('1.#INF00','inf') line = line.replace('1.#IND00','nan') vals = line.split(' ') myarray = array([float(x) for x in vals]) 
+4
source

Alternatively, you can simply call numpy.genfromtxt and use missing_values kwarg.

eg. with this data saved as data.txt :

 1 0.2 0.3 1.#INF00 2 0.5 0.6 0.7 3 1.#IND00 0.1 0.2 4 0.4 0.4 0.5 5 0.5 0.5 0.7 

You can just do something like this (we need to set the comment identifier to something other than the standard "#" in this particular case):

 import numpy as np data = np.genfromtxt('data.txt', missing_values=['1.#INF00', '1.#IND00'], comments='somethingelse') 

This gives:

 array([[ 1. , 0.2, 0.3, nan], [ 2. , 0.5, 0.6, 0.7], [ 3. , nan, 0.1, 0.2], [ 4. , 0.4, 0.4, 0.5], [ 5. , 0.5, 0.5, 0.7]]) 
+1
source

All Articles