Reading a long data structure in Haskell

Question

Reading a long data structure in Haskell

I need to read the data structure from a text file (space separated), one data element per line. My first preliminary would be

data Person = Person {name :: String, surname :: String, age :: Int, ... dozens of other fields} deriving (Show,...) main = do string <- readFile "filename.txt" let people = readPeople string do_something people readPeople s = map (readPerson.words) (lines s) readPerson row = Person (read(row!!0)) (read(row!!1)) (read(row!!2)) (read(row!!3)) ... (read(row!!dozens))

This code works, but the code for readPerson is terrible: I need to copy-paste read(row!!n)) for all fields in my data structure!

So, as a second attempt, I think I can use Currying from the Person function and pass arguments to it one at a time.

Umm, there must be something in Hoogle, but I can’t understand the type signature ... It doesn’t matter, it looks quite simple, and I can write it myself:

 readPerson row = readFields Person row readFields f [x] = (fx) readFields f (x:xs) = readFields (f (read x)) xs

Ahh, it looks much better than the coding style!

But it does not compile! Occurs check: cannot construct the infinite type: t ~ String -> t

In fact, the function f passing to readFields has a different type signature in each call; why I could not determine his signature type ...

So my question is: what is the easiest and most elegant way to read a data structure with many fields?

+6

haskell

Archangel Jul 11 '16 at 9:38

source share

2 answers

Petr pudlák · Answer 1 · 2016-07-11T12:20:08+0000

First, it’s always recommended that you include types for all top-level ads. This makes the code more structured and more readable.

One simple way to achieve this is through the use of applicative functors . During parsing, you have an “efficient” calculation, in which the effect consumes part of the input signal, and its result is one analyzed part. We can use the State monad to track the remaining input and create a polymorphic function that consumes one input element and read it:

 import Control.Applicative import Control.Monad.State data Person = Person { name :: String, surname :: String, age :: Int } deriving (Eq, Ord, Show, Read) readField :: (Read a) => State [String] a readField = state $ \(x : xs) -> (read x, xs)

And for the analysis of many such fields, we use the combinators <$> and <*> , which allow us to sequentially perform operations:

 readPerson :: [String] -> Person readPerson = evalState $ Person <$> readField <*> readField <*> readField

The expression Person <$> ... is of type State [String] Person and we run evalState on this input to start the stateful calculation and extract the result. We still need to have the same amount of readField as many times as there are fields, but without using indexes or explicit types.

For a real program, you will probably enable some error handling, since read will fail with an exception, and patterm (x : xs) if the input list is too short. Using a fully functional parser, for example, parsec or attoparsec allows you to use the same notation and have the correct error handling, configure parsing of individual fields, etc.

An even more universal way is to automate the wrapping and expanding of fields into lists using generics . Then you just get Generic . If you are interested, I can give an example.

Or you can use an existing serialization package, be it binary, for example, grain or binary, or text, such as aeson or yaml, which usually allows you to do (either automatically output the (de) series from Generic or provide your own).

behzad.nouri · Answer 2 · 2016-07-11T10:31:21+0000

EDIT: A simpler solution if you read the lines:

 {-# LANGUAGE FlexibleInstances #-} data Person = Person { name :: String, age :: Int, height :: Double } deriving Show class Person' a where person :: a -> [String] -> Maybe Person instance Person' Person where person c [] = Just c person _ _ = Nothing instance (Read a, Person' b) => Person' (a -> b) where person f (x:xs) = person (f $ read x) xs person _ _ = Nothing instance {-# OVERLAPPING #-} Person' a => Person' (String -> a) where person f (x:xs) = person (fx) xs person _ _ = Nothing

then if the list is the right size:

 \> person Person $ words "John 42 6.05" Just (Person {name = "John", age = 42, height = 6.05})

and if you get nothing:

 \> person Person $ words "John 42" Nothing

Building a Haskell data type with many fields provides a solution when all record fields are of the same type. If not, a slightly more polymorphic solution would be:

 {-# LANGUAGE FlexibleInstances, CPP #-} data Person = Person { name :: String, age :: Int, height :: Double } deriving Show data Val = IVal Int | DVal Double | SVal String class Person' a where person :: a -> [Val] -> Maybe Person instance Person' Person where person c [] = Just c person _ _ = Nothing #define PERSON(t, n) \ instance (Person' a) => Person' (t -> a) where { \ person f ((ni):xs) = person (fi) xs; \ person _ _ = Nothing; } \ PERSON(Int, IVal) PERSON(Double, DVal) PERSON(String, SVal)

then

 \> person Person [SVal "John", IVal 42, DVal 6.05] Just (Person {name = "John", age = 42, height = 6.05})

To build Val types, you can create another type class and create the necessary instances:

 class Cast a where cast :: a -> Val instance Cast Int where cast = IVal instance Cast Double where cast = DVal instance Cast String where cast = SVal

then it will be a little easier:

 \> person Person [cast "John", cast (42 :: Int), cast 6.05] Just (Person {name = "John", age = 42, height = 6.05})

Reading a long data structure in Haskell

More articles: