Column (column) type assertion in Pandas

I am trying to find a better way to assert the column data type in Python / Pandas of a given data frame.

For instance:

import pandas as pd
t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer']})

I would like to state that specific columns in a data frame are numeric. Here is what I have:

numeric_cols = ['a', 'b']  # These will be given
assert [x in ['int64','float'] for x in [t[y].dtype for y in numeric_cols]]

This last line of statement is not very sensitive to pythonic. Perhaps this is so, and I just stuff it all in one hard-to-read line. Is there a better way? I would like to write something like:

assert t[numeric_cols].dtype.isnumeric()

I can not find something like this.

+4
source share
2 answers

ptypes.is_numeric_dtype , ptypes.is_string_dtype , , ptypes.is_datetime64_any_dtype datetime64:

import pandas as pd
import pandas.api.types as ptypes

t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer'],
              'd':pd.date_range('2000-1-1', periods=3)})
cols_to_check = ['a', 'b']

assert all(ptypes.is_numeric_dtype(t[col]) for col in cols_to_check)
# True
assert ptypes.is_string_dtype(t['c'])
# True
assert ptypes.is_datetime64_any_dtype(t['d'])
# True

pandas.api.types ( ptypes) is_datetime64_any_dtype a is_datetime64_dtype. , :

In [239]: ptypes.is_datetime64_any_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[239]: True

In [240]: ptypes.is_datetime64_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[240]: False
+5

import numpy as np
numeric_dtypes = [np.dtype('int64'), np.dtype('float64')]
# or whatever types you want

assert t[numeric_cols].apply(lambda c: c.dtype).isin(numeric_dtypes).all()
+1

All Articles