-
-
Notifications
You must be signed in to change notification settings - Fork 639
Forever to parse 150,000 row file #585
Comments
I also made a custom function bypassing my array building logic. Same slow times. ini_set('max_execution_time', '400');
// Time how long script takes to run
$executionStartTime = microtime(true);
$reader = ReaderFactory::create(Type::XLSX); // for XLSX files
$reader->open(storage_path('spreadsheets/smaller.xlsx'));
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row => $values) {
//
}
}
$reader->close();
$executionEndTime = microtime(true);
$data = round($executionEndTime - $executionStartTime, 2);
return json_encode(['runtime_in_seconds' => $data]); Smaller: 4.55 seconds |
I would love to figure out why this library takes so long, but while I was perusing solutions I found another library - https://github.com/akeneo-labs/spreadsheet-parser - I was able to parse my larger file in 60-80 seconds. Any idea what this library is doing differently? This is exponentially faster. |
Hi @Kryptonit3, This is indeed a strange behavior. It should definitely not take that long to read the large file... I'll try to investigate why it takes so long |
Thanks. I was looking at your code and theirs and it looks like they use |
Spout uses a combination of |
Looks like #617 |
any update about this? |
With #763 being merged, you may now get better results. |
Having a file with ~38000 rows, 79 cols each take ~400s of time with just a loop like @Kryptonit3-zz had done above. |
Is there anyway I can optimize reading this file?
Btw this is a Laravel application.
data_fill()
is a Laravel specific function. Its just my php array building logic.https://github.com/laravel/framework/blob/3414dcfcbe27cf0f4deee0670f022983e8016392/src/Illuminate/Support/helpers.php#L427
I took the 150,000 row, 16 column file and chopped it down to a much smaller sample of 4261 rows and it takes about 10-15 seconds. The complete file takes minutes (had to heavily modify nginx and php to allow for this).
Here is a blackfire.io report of the smaller file - https://blackfire.io/profiles/c4087f40-dd5c-42ed-9258-3c6d5a1ace51/graph
Looks like it is reading all 68176 cells (4261 rows * 16 columns) multiple times on the smaller file 349380 times (the darker red boxes).
Spreadsheets attached.
spreadsheets.zip
Here is the code i am using to process the file(s). I have hard coded some parameters for testing. Normally these would be request variables to allow for different column selection based on the header row.
Screenshot of smaller file output
Thanks for any help I receive.
The text was updated successfully, but these errors were encountered: