I am having a problem with PHPExcel memory while processing an XLS file. I need to work with fairly large files (between 50k and 200k lines and 9-10 cols), so I had to use ReadFilters to solve the memory problem.
However, although it worked pretty well with XLSX files using a combination of background processes and some simple calculations for block size, I cannot get it to work with XLS files.
This is a piece of code where everything explodes:
Class ExcelReadFilter implements PHPExcel_Reader_IReadFilter { private $startRow = 0; private $endRow = 0; public function setRows($startRow, $chunkSize) { $this->startRow = $startRow; $this->endRow = $startRow + $chunkSize; } public function readCell($column, $row, $worksheetName = '') { if ( ($row >= $this->startRow && $row < $this->endRow) ) { return true; } return false; } } PHPExcel_Settings::setCacheStorageMethod( PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized ); .... $filter = new ExcelReadFilter(); $filter->setRows($desde, $cuantas); $reader = PHPExcel_IOFactory::createReader($this->file_type); $reader->setLoadSheetsOnly($sheet_name); $reader->setReadDataOnly(false); $reader->setReadFilter($filter); $chunk = $reader->load($this->file); $chunk->setActiveSheetIndexByName($sheet_name); $active_sheet = $chunk->getActiveSheet(); $rowIterator = $active_sheet->getRowIterator($desde); $this->num_filas = $active_sheet->getHighestRow();
Right after that, I included the following lines to better understand what was going on:
ob_start(); var_dump($rowIterator); $f = fopen("excel-info.txt", "w"); fwrite($f, ob_get_clean()); fclose($f); ob_end_clean(); die;
And I think he pointed out where the memory problem is. When I first downloaded the XLS source file, the excel-info.txt file was 13M in size. Then I opened the XLS file and saved it as XLSX, and repeated this process, after which excel-info.txt was only 285k.
Is there any way to change these filters to work with XLS files?
Oh, and setting the PHP memory limit to a larger value is not an option, but the runtime is not critical.
ADDED
When I used different memory caching options, I was able to reduce the amount of memory used to make it work, and saved it in most cases to an acceptable size.
I am currently using PHPExcel_CachedObjectStorageFactory :: cache_to_sqlite, and it seems that is enough to make it work.
I would like to note that the calculation that I did on the serialized information placed in the file was incorrect. Excel5 files create an array with as many records as rows have an excel file with all its values ββthat do not match the filter condition set to NULL. Of course, when I save it in a text file, having something like ...
array(10) { ["A"]=> NULL ["B"]=> NULL ["C"]=> NULL ["D"]=> NULL ["E"]=> NULL ["F"]=> NULL ["G"]=> NULL ["H"]=> NULL ["I"]=> NULL ["J"]=> NULL }
... takes up a lot of space in the file, but should not be in php_memory, so that was my mistake.
Now I use this code to track memory usage:
for ( $i=1; $i < 20000; $i+=5000 ){ $filter->setRows($i, 5000); echo "\n1- Usage: ".(memory_get_usage ()/1024)/1024; $objPHPExcel = $reader->load($this->file); echo "\n2- Usage: ".(memory_get_usage ()/1024)/1024; $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true); unset($sheetData); unset($objPHPExcel); echo "\n3- Usage: ".(memory_get_usage ()/1024)/1024; }
With a specific xls file, it shows:
1st iteration 1- Application: 4.3859634399414 2- Use: 34.292671203613 3- Use: 34.68034362793
Second iteration 1- Application: 34.68034362793 2- Use: 34.68293762207 3- Use: 34.684982299805
And the same file after saving as XLSX:
1st iteration 1- Application: 4.2780990600586 2- Use: 6.9042129516602 3- Use: 7.2916641235352
Second iteration 1- Application: 7.2916641235352 2- Use: 7.5115432739258 3- Use: 7.2813568115234
I must say, however, that after saving as XLSX, the size is reduced by about half, so I canβt say if this is an error or an expected behavior.