PHP readdir with european characters

I get image files with Czech characters in the file name (for example, ěščřžááíé), and I want to rename them without accents, so that they are more compatible for the Internet. I thought I could use the simple str_replace function, but it doesn't seem to work with a file array like with a string literal.

I read files with readdir after checking the extension.

function readFiles($dir, $ext = false) { if (is_dir($dir)) { if ($dh = opendir($dir)) { while (($file = readdir($dh)) !== false) { if($ext){ if(end(explode('.', $file)) == $ext) { $f[] = $file; } } else { $f[] = $file; } } closedir($dh); return $f; } else { return false; } } else { return false; } } $files = readFiles(".", "jpg"); $search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' '); $replace = array('s','a','z','i','e','e','r','n','y','c','-'); $string = "čšěáýísdjksnalci sášěééalskcnkkjy+ěéší"; $safe_string = str_replace($search, $replace, $string); echo '<pre>'; foreach($files as $fl) { $safe_files[] = str_replace($search, $replace, $fl); } var_dump($files); var_dump($safe_files); var_dump($string); var_dump($safe_string); echo '</pre>'; 

Exit

 array(6) { [0]=> string(21) "Hl vka s listem01.jpg" [1]=> string(23) "Hl vky v atelieru02.jpg" [2]=> string(17) "Jarn  v hon03.jpg" [3]=> string(17) "Mlad  chmel04.jpg" [4]=> string(23) "Stavba chmelnice 05.jpg" [5]=> string(21) "Zimni chmelnice06.jpg" } array(6) { [0]=> string(21) "Hl vka-s-listem01.jpg" [1]=> string(23) "Hl vky-v-atelieru02.jpg" [2]=> string(17) "Jarn -v hon03.jpg" [3]=> string(17) "Mlad -chmel04.jpg" [4]=> string(23) "Stavba-chmelnice-05.jpg" [5]=> string(21) "Zimni-chmelnice06.jpg" } string(53) "čšěáýísdjksnalci sášěééalskcnkkjy+ěéší" string(38) "cseayisdjksnalci-saseeealskcnkkjy+eesi" 

I am currently working on WAMP, but the answers that work on different platforms are even better :)

+4
source share
10 answers

According to the 0xFFFD marks (which appear in Firefox as diamonds with a question mark inside), you no longer read them using the correct encoding (which will be Unicode / UTF-8). As far as I know, this is a bug , it looks like it's related.

Here's another topic about this: problem with php readdir with japanese name

Waiting until they get PHP6, then use it.

Not related to the problem: Normalizer is the best tool to get rid of diacritics .

+5
source

If it works with strings, but not with arrays, it simply applies it to strings :-)

 $search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' '); $replace = array('s','a','z','i','e','e','r','n','y','c','-'); len = count($safe_files) for ($i=0; $i<len; $i++) $safe_files[$i] = str_replace($search, $replace, $safe_files[$i]); 

I think str_replace accept arrays only for the first 2 parameters, and not for the last. Maybe I'm wrong, but it should work anyway.

If in any case you have a real encoding problem, it just might be that your OS uses a single byte encoding, while your source file uses another, possibly UTF-8.

In this case, do something like:

 $search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' '); $replace = array('s','a','z','i','e','e','r','n','y','c','-'); $code_encoding = "UTF-8"; // this is my guess, but put whatever is yours $os_encoding = "CP-1250"; // this is my guess, but put whatever is yours len = count($safe_files) for ($i=0; $i<len; $i++) { $safe_files[$i] = iconv($os_encoding , $code_encoding, $safe_files[$i]); // convert before replace /* ALternatively : $safe_files[$i] = mb_convert_encoding($safe_files[$i], $code_encoding , $os_encoding ); */ $safe_files[$i] = str_replace($search, $replace, $safe_files[$i]); } 

mb_convert_encoding () requires an ext / mbstring extension, and iconv () requires ext / iconv.

+1
source

It is impossible to answer your question, perhaps, but you can take a look at the iconv() function in PHP and much more in particular, the //TRANSLIT parameter, which you can add to the second argument. I used it several times, turning the strings of French and Eastern Europe into their friendly partners az and url.

From PHP.net ( http://www.php.net/manual/en/function.iconv.php )

If you add the // TRANSLIT line to out_charset, transliteration is activated. This means that when a character cannot be represented in the target encoding, it can be approximated by one or more similar characters.

+1
source

The source code (and test string) is mapped to utf8, and the file names seem to use single-byte encoding. I suggest you use the same encoding for your replacement string. To avoid problems with the encoding of the source code, it is better to write the shock characters in your code in hexadecimal form (for example, \ xE8 for "č", etc.).

0
source

So, I started working on my Windows XP system with this

 $search = array('š','á','ž','í','e','é','r','n','ý','c',' '); $replace = array('s','a','z','i','e','e','r','n','y','c','-'); $files = readFiles(".", "jpg"); $len = count($files); for($i = 0; $i < $len; $i++){ if(mb_check_encoding($files[$i], 'ASCII')){ $safe_files[$i] = $files[$i]; }else{ $safe_files[$i] = str_replace( $search, $replace, iconv("iso-8859-1", "utf-8//TRANSLIT", $files[$i])); } if($files[$i] != $safe_files[$i]){ rename($files[$i], $safe_files[$i]); } } 

I don't know if this was a coincidence or not, but calling mb_get_info() shows

[internal_encoding] => ISO-8859-1

0
source

Here is another function I found useful on the strtr PHP page

 <? // Windows-1250 to ASCII // This function replace all Windows-1250 accent characters with // thier non-accent ekvivalents. Useful for Czech and Slovak languages. function win2ascii($str) { $str = StrTr($str, "\xE1\xE8\xEF\xEC\xE9\xED\xF2", "\x61\x63\x64\x65\x65\x69\x6E"); $str = StrTr($str, "\xF3\xF8\x9A\x9D\xF9\xFA\xFD\x9E\xF4\xBC\xBE", "\x6F\x72\x73\x74\x75\x75\x79\x7A\x6F\x4C\x6C"); $str = StrTr($str, "\xC1\xC8\xCF\xCC\xC9\xCD\xC2\xD3\xD8", "\x41\x43\x44\x45\x45\x49\x4E\x4F\x52"); $str = StrTr($str, "\x8A\x8D\xDA\xDD\x8E\xD2\xD9\xEF\xCF", "\x53\x54\x55\x59\x5A\x4E\x55\x64\x44"); return $str; } ?> 

Basically, there was no such problem for converting European characters to ascii equivilent, but I could not find a reliable way to rename files (i.e. link files with characters without ascii).

0
source

For UTF-8, use the utf8_encode PHP function. Microsoft Windows uses ISO-8859-1, so conversion is required in this case.

Example - listing files in a directory:

 <?php $dir_handle = opendir("."); while (false !== ($file = readdir($dir_handle))) { echo utf8_encode($file)."<br>"; } ?> 
0
source

Area5one has this right - this is a different coding problem.

When I upgraded my machine from XP to Win7, I also upgraded my version of MySQL and PHP. Somewhere along the way, PHP programs that previously worked stopped working. In particular, the scandir, readdir and utf-8 lived happily together, but were no longer there.

So, I changed my code. Variables related to data taken from the end of the hard drive in "_iso" to reflect the ISO-8859-1 Windows encoding, data from the MySQL database goes in variables ending in "_utf". Thus, code from the 5one area would like: $ dir_handle_iso = opendir ("."); while (false! == ($ file_iso = readdir ($ dir_handle_iso))) {$ file_utf = utf8_encode ($ file); ...}

0
source

This works for me 100%:

 setlocale(LC_ALL,"cs_CZ"); $new_str = iconv("UTF-8","ASCII//TRANSLIT",$orig_str); 
0
source

$ file = mb_convert_encoding ($ file, 'UTF-8', "iso-8859-1"); Worked for me (Windows, Danish characters).

0
source

All Articles