Overload a data type or use a similar one?

This is more a question about programming style and common practices. But I feel like it doesn't fit into the code review forum ...

My program parses regular expressions and processes them. A regular expression can have regular elements (Kleene closure, concatenation, etc.), as well as links to other regular expressions by their names, for example macros:

data Regex a = Epsilon | Literal a | Ranges [(a, a)] | Ref String | Then (Regex a) (Regex a) | Or (Regex a) (Regex a) | Star (Regex a) 

After processing the regular expression and resolving all macro references and converting Literal elements to Range elements (this is necessary for my purposes), I get a type that Ref and Literal cannot and should not, therefore, in my functions that work with it, I am doing something like:

 foo (Literal _) = error "unexpected literal" foo (Ref _) = error "unexpected reference" foo (Epsilon) = ... foo (Star x) = ... ... 

This looks ugly to me because it checks runtimes instead of checks at compile time. Not a good approach.

So maybe I can introduce another data type that is very similar to the original one and use?

 data RegexSimple a = Epsilon2 | Ranges2 [(a, a)] | Then2 (Regex a) (Regex a) | Or2 (Regex a) (Regex a) | Star2 (Regex a) 

It will work, but here I have a lot of duplication, and now there are beautiful and descriptive names of constructors, and I need to invent new ones ...

What could experts do? I wanna know:)

+6
source share
2 answers

I donโ€™t know what the rest of your code looks like, so for this solution you may need to rethink some aspects, but the most โ€œhaskell-ishโ€ solution to this problem is probably to use GADTs and phantom types . Together, they basically let you create arbitrary subtypes to provide more flexible type safety. You would redefine your types.

 {-# LANGUAGE GADTs #-} data Literal data Ref data Rangeable data Regex ta where Epsilon :: Regex Rangeable a Literal :: a -> Regex Literal a Ranges :: [(a, a)] -> Regex Rangeable a Ref :: String -> Regex Ref a Then :: Regex t' a -> Regex t' a -> Regex Rangeable a Or :: Regex t' a -> Regex t' a -> Regex Rangeable a Star :: Regex t' a -> Regex Rangeable 

Then you can define

 foo :: Regex Rangeable a foo (Epsilon) = ... foo s@ (Star a) = ... 

Now statements like foo $ Literal 'c' will not check compile-time type.

+5
source

I am not an expert, but I also have a problem (although this is more with the type of product than the type of amount).

The obvious solution is to reuse RegexSimple in Regex , so

  data Regex a = Ref a | Literal a | SimpleR (SimpleRegex a) 

Another way is to parameterize Regex with a functor

 data Regex fa = Literal (fa) | Ref (fa) | Epsilon a ... 

and use either Regex Id or Regex Void .

Another way is using Maybe

 data Regex a = Literal (Maybe a) | Epsilon a ... 

But this is less clean because you cannot force a function to accept only simple regular expressions.

+2
source

All Articles