Numpy.genfromtxt with datetime.strptime converter

I have data similar to what is observed in this gist , and I'm trying to extract data using numpy. I am new to python, so I tried to do this with the following code

import numpy as np from datetime import datetime convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f') col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\ "Igniter", "Lamps", "O2", "Time"] data = np.genfromtxt(files[1], skip_header=22,\ names=col_headers,\ converters={"Time": convertfunc}) 

Where, as you can see in essence, there are 22 lines of header material. In Ipython, when I run the following code, I get an error message that ends with the following:

 TypeError: float() argument must be a string or a number 

A complete ipython error trace can be seen here .

I can extract six columns of numeric data just fine using the genfromtxt argument like usecols = range (0.6), but when I try to use the converter to try to tackle the last column, I'm puzzled. Any comments would be appreciated.

+7
source share
2 answers

This is because np.genfromtxt trying to create a floating-point array that fails, because convertfunc returns a datetime object that cannot be distinguished as a float. The simplest solution is to simply pass the dtype='object' argument to np.genfromtxt , ensuring that you create an array of objects and prevent conversion to float. However, this will mean that the remaining columns will be saved as rows. To properly store them as floats, you need to specify the dtype each to get a structured array . Here I set them all to double, except for the last column, which will be the dtype object:

 dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')] data = np.genfromtxt(files[1], skip_header=22, dtype=dd, names=col_headers, converters={'Time': convertfunc}) 

This will give you a structured array that you can access with the names you provided:

 In [74]: data['Mass'] Out[74]: array([ 0.262 , 0.2618, 0.2616, 0.2614]) In [75]: data['Time'] Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000, 1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000], dtype=object) 
+6
source

You can use pandas read_table:

  import pandas as pd frame=pd.read_table('/tmp/gist', header=None, skiprows=22,delimiter='\s+') 

worked for me. You need to handle the header separately, as they are a variable number of separated spaces.

+4
source

All Articles