Numpy.genfromtxt with datetime.strptime converter

Question

Numpy.genfromtxt with datetime.strptime converter

I have data similar to what is observed in this gist , and I'm trying to extract data using numpy. I am new to python, so I tried to do this with the following code

import numpy as np from datetime import datetime convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f') col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\ "Igniter", "Lamps", "O2", "Time"] data = np.genfromtxt(files[1], skip_header=22,\ names=col_headers,\ converters={"Time": convertfunc})

Where, as you can see in essence, there are 22 lines of header material. In Ipython, when I run the following code, I get an error message that ends with the following:

 TypeError: float() argument must be a string or a number

A complete ipython error trace can be seen here .

I can extract six columns of numeric data just fine using the genfromtxt argument like usecols = range (0.6), but when I try to use the converter to try to tackle the last column, I'm puzzled. Any comments would be appreciated.

+7

python numpy ipython

not link Dec 13 '12 at 10:54

source share

2 answers

You can use pandas read_table:

  import pandas as pd frame=pd.read_table('/tmp/gist', header=None, skiprows=22,delimiter='\s+')

worked for me. You need to handle the header separately, as they are a variable number of separated spaces.

+4

nom-mon-ir Dec 14 '12 at 1:27

source share

tiago · Accepted Answer · 2012-12-14T03:23:22+0000

This is because np.genfromtxt trying to create a floating-point array that fails, because convertfunc returns a datetime object that cannot be distinguished as a float. The simplest solution is to simply pass the dtype='object' argument to np.genfromtxt , ensuring that you create an array of objects and prevent conversion to float. However, this will mean that the remaining columns will be saved as rows. To properly store them as floats, you need to specify the dtype each to get a structured array . Here I set them all to double, except for the last column, which will be the dtype object:

 dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')] data = np.genfromtxt(files[1], skip_header=22, dtype=dd, names=col_headers, converters={'Time': convertfunc})

This will give you a structured array that you can access with the names you provided:

 In [74]: data['Mass'] Out[74]: array([ 0.262 , 0.2618, 0.2616, 0.2614]) In [75]: data['Time'] Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000, 1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000], dtype=object)

Numpy.genfromtxt with datetime.strptime converter

More articles: