Unicode Support System.Directory.getDirectoryContents

The following code prints something like °Ð½Ð´Ð¸Ñ-ÐÑпаниÑ

 getDirectoryContents "path/to/directory/that/contains/files/with/nonASCII/names" >>= mapM_ putStrLn 

This seems to be a ghc bug and has already been fixed in the repository. But what to do until everyone updates ghc?

The last time I encountered such a problem (this was a few years ago, by the way), I used the utf8-string package to convert strings, but I don’t remember how I did it, and ghc unicode support has changed markedly over the years.

So what is the best (or at least working) way to get the contents of a directory with full Unicode support?

ghc version 7.0.4 locale en_US.UTF-8

+4
source share
2 answers

Here's the simplest workaround using decodeString and encodeString from utf8-string .

 import System.Directory import qualified Codec.Binary.UTF8.String as UTF8 main = do getDirectoryContents "." >>= mapM_ (putStrLn . UTF8.decodeString) putStrLn "------------" readFile (UTF8.encodeString "brøken-file-nåme.txt") >>= putStrLn 

Output:

 . .. brøken-file-nåme.txt Broken.hs ------------ hello 
+5
source

I would recommend looking at system-filepath , which provides an abstract data type for representing file paths. I used it for internal code and it works great.

+3
source

All Articles