My goal with this script is as follows: 1. Read timseries data from an excel file (> 100,000 thousand lines), as well as headers (labels, units) 2.convert excel numeric dates for the best datetime object for pandas dataFrame 3.You can use timestamps to refer to rows and series labels to refer to columns
So far, I have used xlrd to read excel data in a list. Made pandas Series with each list and used the time list as an index. Combined series with series headers to create a python dictionary. Passed dictionary to pandas DataFrame. Despite my efforts, df.index seems to be configured for column headers, and I'm not sure when to convert dates to a datetime object.
I just started using python 3 days ago, so any advice would be great! Here is my code:
#Open excel workbook and first sheet wb = xlrd.open_workbook("C:\GreenCSV\Calgary\CWater.xlsx") sh = wb.sheet_by_index(0) #Read rows containing labels and units Labels = sh.row_values(1, start_colx=0, end_colx=None) Units = sh.row_values(2, start_colx=0, end_colx=None) #Initialize list to hold data Data = [None] * (sh.ncols) #read column by column and store in list for colnum in range(sh.ncols): Data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=None) #Delete unecessary rows and columns del Labels[3],Labels[0:2], Units[3], Units[0:2], Data[3], Data[0:2] #Create Pandas Series s = [None] * (sh.ncols - 4) for colnum in range(sh.ncols - 4): s[colnum] = Series(Data[colnum+1], index=Data[0]) #Create Dictionary of Series dictionary = {} for i in range(sh.ncols-4): dictionary[i]= {Labels[i] : s[i]} #Pass Dictionary to Pandas DataFrame df = pd.DataFrame.from_dict(dictionary)
python pandas excel
pbreach
source share