Create a lazy I / O list from a non-IO list

Question

Create a lazy I / O list from a non-IO list

I have a lazy list of file names created by find . I would also like to lazily load the metadata of these files. This means that if I take 10 elements from metadata , it should search only the metadata of these ten files. The fact of find perfectly gives you 10 files if you request them without hanging your disk, while my script searches for metadata of all files.

 main = do files <- find always always / metadata <- loadMetaList files loadMetaList :: [String] -> IO [Metadata] loadMetaList file:files = do first <- loadMeta file rest <- loadMetaList files return (first:rest) loadMeta :: String -> IO Metadata

As you can see, loadMetaList is not lazy . To be lazy, it must use tail recursion. Something like return (first:loadMetaList rest) .

How to make loadMetaList lazy ?

+4

file-io haskell lazy-evaluation lazy-sequences lazy-loading

Andras gyomrey Apr 26 '13 at 19:26

source share

2 answers

This is how you do pipes . I do not know how you implement loadMeta and find , so I just did something:

 import Pipes find :: Producer FilePath IO () find = each ["heavy.mp3", "metal.mp3"] type MetaData = String loadMeta :: String -> IO MetaData loadMeta file = return $ "This song is " ++ takeWhile (/= '.') file loadMetaList :: Pipe FilePath MetaData IO r loadMetaList = mapM loadMeta

To start it, we simply compose the processing steps, such as a pipeline, and start the pipeline using runEffect :

 >>> runEffect $ find >-> loadMetaList >-> stdoutLn This song is heavy This song is metal

There are several key points to note:

You can make find a Producer so that it also lazily searches the directory tree. I know that you do not need this function because your set of files is small now, but it is very easy to include it later when your directory becomes larger.
This is lazy, but without unsafeInterleaveIO . It immediately generates every output and does not wait to first collect the entire list of results.

For example, it will work even if we use an endless list of files:

 >>> import qualified Pipes.Prelude as Pipes >>> runEffect $ each (cycle ["heavy.mp3", "metal.mp3"]) >-> loadMetaList >-> Pipes.stdoutLn This song is heavy This song is metal This song is heavy This song is metal This song is heavy This song is metal ...

It will calculate as much as necessary. If we indicate that we only need three results, it will fulfill the minimum load required to return the two results, even if we provide an endless list of files.

For example, we can limit the number of results using take :

 >>> runEffect $ each (cycle ["heavy.mp3", "metal.mp3"]) >-> loadMetaList >-> Pipes.take 3 >-> Pipes.stdoutLn This song is heavy This song is metal This song is heavy

So, you asked what is wrong with unsafeInterleaveIO . The main limitation of unsafeInterleaveIO is that you cannot guarantee when IO actions are actually executed, which leads to the following common errors:

Handle accidentally closes before reading a file
IO actions that happen late or never
Pure code having side effects and throwing IOException s

The biggest advantage of the Haskell IO system over other languages is that Haskell completely separates the evaluation model from the order of side effects. When you use lazy IO , you lose this denouement, and then the side effect order is tightly integrated with the Haskell score model, which is a huge step backward.

This is why it is generally unwise to use lazy IO , especially now when there are simple and elegant alternatives.

If you want to learn more about how to use pipes to safely store lazy IO , you can read the extensive tutorial .

+8

Gabriel gonzalez Apr 26 '13 at 21:59

source share

Daniel Fischer · Accepted Answer · 2013-04-26T19:32:45+0000

(>>=) IO monad is such that in

 loadMetaList :: [String] -> IO [Metadata] loadMetaList file:files = do first <- loadMeta file rest <- loadMetaList files return (first:rest)

the loadMetaList files action must be performed before return (first:rest) executed.

You can avoid this by delaying the execution of loadMetaList files ,

 import System.IO.Unsafe loadMetaList :: [String] -> IO [Metadata] loadMetaList file:files = do first <- loadMeta file rest <- unsafeInterleaveIO $ loadMetaList files return (first:rest)

with unsafeInterleaveIO (which is also used by find ). Thus, loadMetaList files not executed until its result is needed, and if you need only metadata from 10 files, only this is loaded.

This is not as unsafe as his cousin unsafePerformIO , but he should also be handled with care.

Create a lazy I / O list from a non-IO list

More articles: