PHPExcel exceeds memory usage with XLS files only

I am having a problem with PHPExcel memory while processing an XLS file. I need to work with fairly large files (between 50k and 200k lines and 9-10 cols), so I had to use ReadFilters to solve the memory problem.

However, although it worked pretty well with XLSX files using a combination of background processes and some simple calculations for block size, I cannot get it to work with XLS files.

This is a piece of code where everything explodes:

Class ExcelReadFilter implements PHPExcel_Reader_IReadFilter { private $startRow = 0; private $endRow = 0; public function setRows($startRow, $chunkSize) { $this->startRow = $startRow; $this->endRow = $startRow + $chunkSize; } public function readCell($column, $row, $worksheetName = '') { if ( ($row >= $this->startRow && $row < $this->endRow) ) { return true; } return false; } } PHPExcel_Settings::setCacheStorageMethod( PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized ); .... $filter = new ExcelReadFilter(); $filter->setRows($desde, $cuantas); $reader = PHPExcel_IOFactory::createReader($this->file_type); $reader->setLoadSheetsOnly($sheet_name); $reader->setReadDataOnly(false); $reader->setReadFilter($filter); $chunk = $reader->load($this->file); $chunk->setActiveSheetIndexByName($sheet_name); $active_sheet = $chunk->getActiveSheet(); $rowIterator = $active_sheet->getRowIterator($desde); $this->num_filas = $active_sheet->getHighestRow(); 

Right after that, I included the following lines to better understand what was going on:

 ob_start(); var_dump($rowIterator); $f = fopen("excel-info.txt", "w"); fwrite($f, ob_get_clean()); fclose($f); ob_end_clean(); die; 

And I think he pointed out where the memory problem is. When I first downloaded the XLS source file, the excel-info.txt file was 13M in size. Then I opened the XLS file and saved it as XLSX, and repeated this process, after which excel-info.txt was only 285k.

Is there any way to change these filters to work with XLS files?

Oh, and setting the PHP memory limit to a larger value is not an option, but the runtime is not critical.

ADDED

When I used different memory caching options, I was able to reduce the amount of memory used to make it work, and saved it in most cases to an acceptable size.

I am currently using PHPExcel_CachedObjectStorageFactory :: cache_to_sqlite, and it seems that is enough to make it work.

I would like to note that the calculation that I did on the serialized information placed in the file was incorrect. Excel5 files create an array with as many records as rows have an excel file with all its values ​​that do not match the filter condition set to NULL. Of course, when I save it in a text file, having something like ...

 array(10) { ["A"]=> NULL ["B"]=> NULL ["C"]=> NULL ["D"]=> NULL ["E"]=> NULL ["F"]=> NULL ["G"]=> NULL ["H"]=> NULL ["I"]=> NULL ["J"]=> NULL } 

... takes up a lot of space in the file, but should not be in php_memory, so that was my mistake.

Now I use this code to track memory usage:

 for ( $i=1; $i < 20000; $i+=5000 ){ $filter->setRows($i, 5000); echo "\n1- Usage: ".(memory_get_usage ()/1024)/1024; $objPHPExcel = $reader->load($this->file); echo "\n2- Usage: ".(memory_get_usage ()/1024)/1024; $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true); unset($sheetData); unset($objPHPExcel); echo "\n3- Usage: ".(memory_get_usage ()/1024)/1024; } 

With a specific xls file, it shows:

1st iteration 1- Application: 4.3859634399414 2- Use: 34.292671203613 3- Use: 34.68034362793

Second iteration 1- Application: 34.68034362793 2- Use: 34.68293762207 3- Use: 34.684982299805

And the same file after saving as XLSX:

1st iteration 1- Application: 4.2780990600586 2- Use: 6.9042129516602 3- Use: 7.2916641235352

Second iteration 1- Application: 7.2916641235352 2- Use: 7.5115432739258 3- Use: 7.2813568115234

I must say, however, that after saving as XLSX, the size is reduced by about half, so I can’t say if this is an error or an expected behavior.

+8
php phpexcel phpexcelreader
source share
2 answers

PHPExcel is the god of memory. I used it for several clients and found that you need to experiment with the php memory limit option to find a sweet spot where it can load the middle file that the client can import. In some projects I had to use up to 8 GB. Of course, you do this in a routine that loads the xls file using ini_set ('memory_limit', '16M'), and not in the php.ini file.

Have you tried setReadDataOnly (true)?

I think the reason is that xls files are not only csv data, but also all that is needed for many other information (e.g. fonts and macros). When you load a file, PHPExcel tries to load all the parts into memory, creating a huge structure.

+2
source share

Just add this line (for example)

ini_set ('memory_limit', '254m');

this will solve the memory problem.

Feel free to change the memory limit to make it suitable for your case.

0
source share

All Articles