Does parallel file read / write in Haskell?

I have large files where Binary data is stored. There are several streams of reading and writing these files, my current project synchronizes them with one Lock . Thus, I have only one Handle in ReadWriteMode for the file, and all threads fight for this single lock when they want to do some I / O.

I would like to improve this by allowing multiple readers to work simultaneously. I tried using RWLock and opening a few pens. RWLock ensures that only one thread modifies the file, while many threads (just like I have open handles, compile time constant) can be read at a time. When I tried to run this, I was struck by the fact that the runtime allows only one Handle in the ReadWriteMode for the file at any time.

How can I solve this situation? I assume getting / releasing Handle is an expensive operation, so just opening the file in the appropriate mode after receiving RWLock is not really an option. Or maybe there is a package that offers an API similar to the Java FileChannel read and write methods?

PS: I would like to support 32-bit architectures, so memory mapping IOs are not possible for files> 4GiB, right?

+6
source share
2 answers

So your problem is that you do not want to use stateful Handle (where state is the current location in the file)? In this case, you need pread and pwrite .

man pread

For Haskell binding: http://hackage.haskell.org/package/unix-bytestring-0.3.7.2/docs/System-Posix-IO-ByteString.html

For an example of use, you can see here: https://github.com/errge/PrefetchFS/blob/master/PrefetchHandle.hs

+1
source

You must create a type around the file descriptor and a mutex lock. Here's a simple implementation that I think will work for your purposes.

 module SharedHandle (SharedHandle, newSharedHandle, withSharedHandle) where import Control.Concurrent.MVar import System.IO data SharedHandle = SharedHandle Handle (MVar ()) newSharedHandle :: IO Handle -> IO SharedHandle newSharedHandle makeHandle = do handle <- makeHandle lock <- newMVar() return $ SharedHandle handle lock withSharedHandle :: SharedHandle -> (Handle -> IO a) -> IO a withSharedHandle (SharedHandle handle lock) operation = do () <- takeMVar lock val <- operation handle putMVar lock () return val 

What is being done here, I created a new data type, which is essentially just a file descriptor. The only difference is that it is also equipped with its own individual mutex lock implemented with MVar. I have provided two functions to work on this new type. newSharedHandle performs an operation that would create a normal descriptor and create a shared descriptor with a fresh lock. withSharedHandle performs an operation to work with descriptors, locks a common descriptor, performs an operation, and then unlocks the descriptor. Please note that the constructor or accessors are not provided from the module, so we can be sure that no process forgets to release the lock, and we never get deadlocks on one specific access.

Replacing all file descriptors in your program with this new type may solve your problem.

+1
source

All Articles