I am reading JSON files in dataframes. The data frame may contain some columns of type String (object), some Numeric (int64 and / or float64), and some columns of type datetime. When data is read, the data type is often incorrect (for example, datetime, int, and float are often stored as an object type). I want to report this opportunity. (i.e., the column is in the data frame as an “object” (String), but in fact it is “date-time”).
The problem is that when I use pd.to_numeric and pd.to_datetime , they will evaluate and try to convert the column, and many times this ends up depending on which of the two I call the last ... (I was going to use convert_objects ( ) , which works, but which depreciates, so we need a better option).
The code I use to evaluate the dataframe column (I understand that many of the following are redundant, but I wrote this for reading):
try: inferred_type = pd.to_datetime(df[Field_Name]).dtype if inferred_type == "datetime64[ns]": inferred_type = "DateTime" except: pass try: inferred_type = pd.to_numeric(df[Field_Name]).dtype if inferred_type == int: inferred_type = "Integer" if inferred_type == float: inferred_type = "Float" except: pass
python profiling pandas
Calamari
source share