Merge date column and date column into index in pandas data frame

I have an intraday 30 second interval time series of data in a CSV file with the following format:

20120105, 080000, 1 20120105, 080030, 2 20120105, 080100, 3 20120105, 080130, 4 20120105, 080200, 5 

How can I read it in a pandas data frame with these two different indexing schemes:

1, Combine date and time into one datetime index

2, Use date as primary index and time as secondary index in multi-index data

What are the pros and cons of these two schemes? Is it generally preferable to another? In my case, I would like to take a look at the analysis of the time of day, but I'm not quite sure which scheme will be more convenient for my purpose. Thanks in advance.

+4
source share
1 answer
  • Combine date and time into a single datetime index

     df = pd.read_csv(io.BytesIO(text), parse_dates = [[0,1]], header = None, index_col = 0) print(df) # 2 # 0_1 # 2012-01-05 08:00:00 1 # 2012-01-05 08:00:30 2 # 2012-01-05 08:01:00 3 # 2012-01-05 08:01:30 4 # 2012-01-05 08:02:00 5 
  • Use date as primary index and time as secondary index in multiindex dataframe

     df2 = pd.read_csv(io.BytesIO(text), parse_dates = True, header = None, index_col = [0,1]) print(df2) # 2 # 0 1 # 2012-01-05 80000 1 # 80030 2 # 80100 3 # 80130 4 # 80200 5 

My naive tendency was to prefer a single index over a multi-index.

  • According to Zen of Python, "Flat is better than nested."
  • datetime is one conceptual object. Consider this as such. (It is better to have one datetime object than several columns for a year, month, day, hour, minute, etc. Likewise, it is better to have one index, not two.)

However, I am not very experienced with Pandas, and there may be some advantage when using multi-index when analyzing the time of day.

I would try to code some typical calculations in both directions, and then see which one I like best, based on ease of coding, readability, and performance.


It was my setup to get the results above.

 import io import pandas as pd text = '''\ 20120105, 080000, 1 20120105, 080030, 2 20120105, 080100, 3 20120105, 080130, 4 20120105, 080200, 5''' 

Of course you can use

 pd.read_csv(filename, ...) 

instead

 pd.read_csv(io.BytesIO(text), ...) 
+6
source

All Articles