Combine date and time into a single datetime index
df = pd.read_csv(io.BytesIO(text), parse_dates = [[0,1]], header = None, index_col = 0) print(df) # 2 # 0_1 # 2012-01-05 08:00:00 1 # 2012-01-05 08:00:30 2 # 2012-01-05 08:01:00 3 # 2012-01-05 08:01:30 4 # 2012-01-05 08:02:00 5
Use date as primary index and time as secondary index in multiindex dataframe
df2 = pd.read_csv(io.BytesIO(text), parse_dates = True, header = None, index_col = [0,1]) print(df2)
My naive tendency was to prefer a single index over a multi-index.
- According to Zen of Python, "Flat is better than nested."
- datetime is one conceptual object. Consider this as such. (It is better to have one datetime object than several columns for a year, month, day, hour, minute, etc. Likewise, it is better to have one index, not two.)
However, I am not very experienced with Pandas, and there may be some advantage when using multi-index when analyzing the time of day.
I would try to code some typical calculations in both directions, and then see which one I like best, based on ease of coding, readability, and performance.
It was my setup to get the results above.
import io import pandas as pd text = '''\ 20120105, 080000, 1 20120105, 080030, 2 20120105, 080100, 3 20120105, 080130, 4 20120105, 080200, 5'''
Of course you can use
pd.read_csv(filename, ...)
instead
pd.read_csv(io.BytesIO(text), ...)
source share