Merge date column and date column into index in pandas data frame

Question

Merge date column and date column into index in pandas data frame

I have an intraday 30 second interval time series of data in a CSV file with the following format:

20120105, 080000, 1 20120105, 080030, 2 20120105, 080100, 3 20120105, 080130, 4 20120105, 080200, 5

How can I read it in a pandas data frame with these two different indexing schemes:

1, Combine date and time into one datetime index

2, Use date as primary index and time as secondary index in multi-index data

What are the pros and cons of these two schemes? Is it generally preferable to another? In my case, I would like to take a look at the analysis of the time of day, but I'm not quite sure which scheme will be more convenient for my purpose. Thanks in advance.

+4

python pandas

ezbentley Jan 12 '13 at 10:10

source share

1 answer

unutbu · Accepted Answer · 2013-01-12T22:54:41+0000

Combine date and time into a single datetime index

 df = pd.read_csv(io.BytesIO(text), parse_dates = [[0,1]], header = None, index_col = 0) print(df) # 2 # 0_1 # 2012-01-05 08:00:00 1 # 2012-01-05 08:00:30 2 # 2012-01-05 08:01:00 3 # 2012-01-05 08:01:30 4 # 2012-01-05 08:02:00 5

Use date as primary index and time as secondary index in multiindex dataframe

 df2 = pd.read_csv(io.BytesIO(text), parse_dates = True, header = None, index_col = [0,1]) print(df2) # 2 # 0 1 # 2012-01-05 80000 1 # 80030 2 # 80100 3 # 80130 4 # 80200 5

My naive tendency was to prefer a single index over a multi-index.

According to Zen of Python, "Flat is better than nested."
datetime is one conceptual object. Consider this as such. (It is better to have one datetime object than several columns for a year, month, day, hour, minute, etc. Likewise, it is better to have one index, not two.)

However, I am not very experienced with Pandas, and there may be some advantage when using multi-index when analyzing the time of day.

I would try to code some typical calculations in both directions, and then see which one I like best, based on ease of coding, readability, and performance.

It was my setup to get the results above.

 import io import pandas as pd text = '''\ 20120105, 080000, 1 20120105, 080030, 2 20120105, 080100, 3 20120105, 080130, 4 20120105, 080200, 5'''

Of course you can use

 pd.read_csv(filename, ...)

instead

 pd.read_csv(io.BytesIO(text), ...)

Merge date column and date column into index in pandas data frame

More articles: