I am writing a simple site parser in PHP 5.2.10.
When using the default internal encoding (which is ISO-8859-1), I always get an error with the same function call:
$start = mb_strpos($index, '<a name=gr1>');
Fatal error: The allowed memory size of 50331648 bytes has been exhausted (tried to allocate 11924760 bytes)
The length of the $ index string in this case was 2981190 bytes - exactly 4 times less than what PHP was trying to allocate.
Now if i use
mb_internal_encoding('UTF-8')
the error disappears. Does this mean that PHP uses more memory for single-byte strings, what for multi-byte ones? How is this possible? Any ideas?
UPD: memory usage does not seem to depend on the encoding: the average get_usage () memory is almost the same using UTF-8 and ISO-8859-1. I think the problem may be in mb_strpos. Actually, the string $ index is encoded in Windows-1251 (Cyrillic), so it contains characters that are not allowed for UTF-8. This can lead to the fact that mb_strpos will somehow try to convert or just use additional memory for some needs. Let's try to find the answer in mb_strpos sources.
source share