PHP7 UTF-8 file names on a Windows server, a new phenomenon caused by ZipArchive

Update:

Preparing a bug report for the great people who make PHP 7 possible, I reviewed my research again and tried to melt it into a few simple lines of code. By doing this, I found that PHP itself is not the cause of the problem. I will share my results here when done. Just so you know and don't waste your time or anything else :)


Synopsis: PHP7 now seems to be able to write UTF-8 file names but cannot access them?

Preamble: I read about 10-15 articles that touched on the topic, but they did not help me solve the problem, and they are all older than the release of PHP7. It seems to me that this is probably a new problem, and I wonder if this could be a mistake. I spent a lot of time experimenting with decrypting strings and trying to figure out a way to make it work - to no avail.

Good day to all and greetings from Germany (add a shy non-my-native language-comment here), I hope you can help me in this new phenomenon that I encountered. It seems “new” in the sense that it came with PHP 7.

I think most people working with PHP on a Windows system are very familiar with the problem of file names and the transparent PHP shell that controls access to files that have non-ASCII file names (or Windows-1252, or something like system code page).

I’m not quite sure how to approach the topic, and, as you can see, I’m not very good at drawing up questions, so please do not immediately tear my head apart. And yes, I will try to keep him informed. Here we go:

The first symptom: after upgrading to PHP7, I sometimes encountered problems accessing the files generated by my software. Sometimes it worked as usual, sometimes not. I found out that the difference is that PHP7 now seems to be able to write UTF-8 file names, but it cannot access files with these names.

After creating the specified files on two separate "identical" systems (differing only in the PHP version), this is how the files are called on the hard drive:

PHP 5.5: Lokaltest_KG_æ¼ ¢ å-_æ ± ‰ å-_Krümhold-DEZ1604-140081-complete.zip

PHP 7: Lokaltest_KG_ 漢字 _ 汉字 _Krümhold-DEZ1604-140081-complete.zip

Splendid, PHP 7 is able to write Unicode file names to the HDD, and UTF-16 is used on windows afaik. Now the disadvantage is that when I try to access these files, for example, using is_file() PHP 5.5 works, but PHP 7 does not.

Consider this piece of code (note: I "hacked" this function because it was the easiest way, it was not written for this purpose). This function is called after creating the zip file, taking the client name and other values ​​to determine its own name. They exit the database. The database and internal PHP encoding are UTF-8. clearstatcache alone is not needed, but I turned it on to make things more clear. It is important . Everything that happens is done using PHP7, no other object is responsible for creating the zip file. To be precise, this is done using the class ZipArchive . In fact, it does not even matter that this is a zip archive, the fact is that the file name and file contents are created using PHP7 - successfully.

 public static function downloadFileAsStream( $file ) { clearstatcache(); print $file . "<br/>"; var_dump(is_file($file)); die(); } 

Exit:

 D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip bool(false) 

So, PHP7 is able to generate a file - they really exist on the hard drive and are legal, accessible and all - but incapable of accessing them. is_file not the only function that fails, for example, file_exists() .

A little experiment with encoding conversion to give you a taste of the things I tried:

 public static function downloadFileAsStream( $file ) { clearstatcache(); print $file . "<br/>"; print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', false) . "<br/>"; print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true) . "<br/>"; if (($detectedEncoding = mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true)) != 'windows-1252') { $file = mb_convert_encoding($file, 'UTF-16', $detectedEncoding); } print $file . "<br/>"; var_dump(is_file($file)); die(); } 

Exit:

 D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip UTF-8 UTF-8 D:/htdocs/otm/.data/_tmp/Lokaltest_KG_o"[W_lI[W_Kr mhold-DEZ1604-140081-complete.zip NULL 

Therefore, the conversion from UTF-8 (database / internal encoding) to UTF-16 (Windows file system) does not work either.

I am at the end of my rope here, and, unfortunately, the problem is very important for us, because we cannot update our systems with this problem that occurs in the background. Hope someone can shed some light on this. Sorry for the long post, I'm not sure how well I could understand my point.


Addition:

 $file = utf8_decode($file); var_dump(is_file($file)); die(); 

Gives false for the file name with Japanese letters. When I change the input used to create the file name, so now the file name Lokaltest_KG_Krümhold-DEZ1604-140081-complete.zip above the code delivers true. So utf8_decode help, but only with a small part of Unicode, German umlauts?

+6
source share
1 answer

Answering my own question: the real bad boy was a ZipArchive component that created files with incorrectly encoded file names. I wrote a promising helpful bug report: https://bugs.php.net/bug.php?id=72200

Consider this short script:

 print "php default_charset: ".ini_get('default_charset')."\n"; // just 4 info (UTF-8) $filename = "bugtest_müller-lüdenscheid.zip"; // just an example $filename = utf8_encode($filename); // simulating my database delivering utf8-string $zip = new ZipArchive(); if( $zip->open($filename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true ) { $zip->addFile('bugtest.php', 'bugtest.php'); // copy of script file itself $zip->close(); } var_dump( is_file($filename) ); // delivers ? 

exit:

 output PHP 5.5.35: php default_charset: UTF-8 bool(true) output PHP 7.0.6: php default_charset: UTF-8 bool(false) 
0
source

All Articles