Is there a faster way to parse an Excel document using Powershell?

I am interacting with an MS Excel document through Powershell . There is a possibility that each Excel document contains about 1000 rows of data.

Currently, this script seems to be reading an Excel file and writing a value to the screen at a speed of 1 record every 0.6 seconds. At first glance it seems very slow.

This is my first time I read an Excel file with Powershell , is this the norm? Is there a faster way to read and analyze Excel data?

Here is the script output (trimmed for readability)

 PS P:\Powershell\ExcelInterfaceTest> .\WRIRMPTruckInterface.ps1 test.xlsx 3/20/2013 4:46:01 PM --------------------------- 2 078110 3 078108 4 078107 5 078109 <SNIP> 242 078338 243 078344 244 078347 245 078350 3/20/2013 4:48:33 PM --------------------------- PS P:\Powershell\ExcelInterfaceTest> 

Here is the Powershell script:

 ######################################################################################################## # This is a common function I am using which will release excel objects ######################################################################################################## function Release-Ref ($ref) { ([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0) [System.GC]::Collect() [System.GC]::WaitForPendingFinalizers() } ######################################################################################################## # Variables ######################################################################################################## ######################################################################################################## # Creating excel object ######################################################################################################## $objExcel = new-object -comobject excel.application # Set to false to not open the app on screen. $objExcel.Visible = $False ######################################################################################################## # Directory location where we have our excel files ######################################################################################################## $ExcelFilesLocation = "C:/ShippingInterface/" + $args[0] ######################################################################################################## # Open our excel file ######################################################################################################## $UserWorkBook = $objExcel.Workbooks.Open($ExcelFilesLocation) ######################################################################################################## # Here Item(1) refers to sheet 1 of of the workbook. If we want to access sheet 10, we have to modify the code to Item(10) ######################################################################################################## $UserWorksheet = $UserWorkBook.Worksheets.Item(2) ######################################################################################################## # This is counter which will help to iterrate trough the loop. This is simply a row counter # I am starting row count as 2, because the first row in my case is header. So we dont need to read the header data ######################################################################################################## $intRow = 2 $a = Get-Date write-host $a write-host "---------------------------" Do { # Reading the first column of the current row $TicketNumber = $UserWorksheet.Cells.Item($intRow, 1).Value() write-host $intRow " " $TicketNumber $intRow++ } While ($UserWorksheet.Cells.Item($intRow,1).Value() -ne $null) $a = Get-Date write-host $a write-host "---------------------------" ######################################################################################################## # Exiting the excel object ######################################################################################################## $objExcel.Quit() ######################################################################################################## #Release all the objects used above ######################################################################################################## $a = Release-Ref($UserWorksheet) $a = Release-Ref($UserWorkBook) $a = Release-Ref($objExcel) 
+7
source share
2 answers

If the data is static (without any formulas, only data in cells), you can access the spreadsheet as an ODBC data source and execute SQL queries (or at least SQL-like ones). Look at this link to configure your connection string (each worksheet in the book will be a โ€œtableโ€ for this exercise) and use System.Data for the query just like a regular database (Don Jones wrote a wrapper function for this, which can help )

This should be faster than running Excel and selecting a cell through a cell.

+6
source

On his blog Accelerating Excel File Reading in PowerShell Robert M. Tups Jr. explains that when you load into PowerShell quickly, actually scanning Excel cells is very slow. On the other hand, PowerShell can quickly read a text file, so its solution is to load the spreadsheet into PowerShell, use the native CSV export process for Excel to save it as a CSV file, and then use the standard Import-Csv PowerShells cmdlet to process data incredibly fast . He reports that this enabled him to import the process 20 times faster!

Using the Toups code, I created an Import-Excel function that allows you to easily import spreadsheet data. My code adds the ability to select a specific worksheet in an Excel workbook, and not just use the default worksheet (i.e. the active worksheet when saving the file). If you omit the โ€“SheetName parameter, it uses the default table.

 function Import-Excel([string]$FilePath, [string]$SheetName = "") { $csvFile = Join-Path $env:temp ("{0}.csv" -f (Get-Item -path $FilePath).BaseName) if (Test-Path -path $csvFile) { Remove-Item -path $csvFile } # convert Excel file to CSV file $xlCSVType = 6 # SEE: http://msdn.microsoft.com/en-us/library/bb241279.aspx $excelObject = New-Object -ComObject Excel.Application $excelObject.Visible = $false $workbookObject = $excelObject.Workbooks.Open($FilePath) SetActiveSheet $workbookObject $SheetName | Out-Null $workbookObject.SaveAs($csvFile,$xlCSVType) $workbookObject.Saved = $true $workbookObject.Close() # cleanup [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) | Out-Null $excelObject.Quit() [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) | Out-Null [System.GC]::Collect() [System.GC]::WaitForPendingFinalizers() # now import and return the data Import-Csv -path $csvFile } 

These additional features are used by Import-Excel:

 function FindSheet([Object]$workbook, [string]$name) { $sheetNumber = 0 for ($i=1; $i -le $workbook.Sheets.Count; $i++) { if ($name -eq $workbook.Sheets.Item($i).Name) { $sheetNumber = $i; break } } return $sheetNumber } function SetActiveSheet([Object]$workbook, [string]$name) { if (!$name) { return } $sheetNumber = FindSheet $workbook $name if ($sheetNumber -gt 0) { $workbook.Worksheets.Item($sheetNumber).Activate() } return ($sheetNumber -gt 0) } 
+7
source

All Articles