In Pandas, what is the equivalent of "nrows" from read_csv () to be used in read_excel ()?

You want to import only a certain range of data from an Excel spreadsheet (.xlsm format, as there are macros) into a pandas dataframe. Did it like this:

data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, parse_cols = "A:D") 

But it looks like nrows only works with read_csv ()? What would be the equivalent for read_excel ()?

+7
python pandas
source share
2 answers

If you know the number of lines in an Excel worksheet, you can use the skip_footer parameter to read the first lines n - skip_footer of your file, where n is the total number of lines.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Using:

 data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80) 

Assuming your Excel worksheet contains 100 rows, this row will parse the first 20 rows.

+9
source share

I would like to (extend) the @Erol answer bit more flexible.

Assuming we DO NOT know the total number of rows in an excel sheet:

 xl = pd.ExcelFile(filepath) # parsing first (index: 0) sheet total_rows = xl.book.sheet_by_index(0).nrows skiprows = 4 nrows = 20 # calc number of footer rows # (-1) - for the header row skip_footer = total_rows - nrows - skiprows - 1 df = xl.parse(0, skiprows=skiprows, skip_footer=skip_footer, parse_cols="A:D") \ .dropna(axis=1, how='all') 

.dropna(axis=1, how='all') will delete all columns containing only NaN '

+3
source share

All Articles