Is there a name for ADT with explicit subtyping?

I am looking for a suitable name for a data type that combines ADT with explicit subtyping.

In one of my applications, I use a structure similar to ADT to represent parsing trees on which I do recursive pattern matching. I find this pretty convenient if I could combine ADT with subtyping, as shown in the example below:

Note: the example is written in Haskell syntax, but this is not Haskell code.

data Empty = Empty data Expr = Int Int | Add Expr AddOp Expr data OptionalExpr = | Empty // I want to make Empty a subtype of OptionalExpr | Expr // I want to make Expr a subtype of OptionalExpr 

In the above example, I first define 2 types: Empty and Expr. Then I make these 2 types a subtype of OptionalExpr. I understand that this type of data is unusual. Neither Haskell nor OCaml support it. But I do not know other functional languages.

I am looking for something that combines ADT with explicit subtyping, rather than structurally implied subtyping, as in the polymorphic version. For this, several justifications are needed:

  • First, we want to subtype all-or-not. Say we want A to be a subtype of B, then we will never want to include only some of the options A under B. Either A is a subtype of B, and in this case B includes all the options of A, or is not a subtype of B, and in this case B does not include any of the options A. We do not allow the gray area between them.
  • Secondly, we do not want B to be open in any sense. We mean a very specific set of subtypes of B. We do not want something to become an instance of B, simply by implementing a type class or the like.
  • Thirdly, let's say type A has a large number of options. We want to make type B a supertype A. Copying all variants to B, as required in the polymorphic version, is simply too cumbersome and error prone.
  • Fourth, we don’t want to introduce new value constructors when all we want to express is a subtype. In the above example, we could write OptionalExpr as an ADT with 2 value constructors, for example: data OptionalExpr = EmptyExpr | NonEmptyExpr Expr data OptionalExpr = EmptyExpr | NonEmptyExpr Expr , or we could use Maybe , but in my application this is unacceptable, since the implementation level can be quite deep, and it would be a nightmare to deconstruct a deeply embedded value, for example (L1 (L2 (L3 (L4 (L5 value_wanted)))))

To give you an idea of ​​why such requirements exist, I show a more specific example below:

 PrimaryExpr = ID | LeftParen Expr RightParen UnaryExpr = PrimaryExpr | NegateOp PrimaryExpr // - MultExpr = UnaryExpr | MultExpr MultOp UnaryExpr // * AddExpr = MultExpr | AddExpr AddOp MultExpr // + CompExpr = AddExpr | AddExpr CompOp AddExpr Expr = CompExpr 

The above example expresses a hierarchy of subtypes and expresses ideas such as AddExpr, is CompExpr, but CompExpr is not AddExpr. For this specific example, some people suggested to me that I can replace UnaryExpr, MultExpr, AddExpr, etc. Only with Expr. That is, I can define all types as one type. This loses type constraints such as CompExpr, is not AddExpr, and since I am doing recursive pattern matching on these types, I need the constraints of this hierarchy to be statically exposed.

Is there a name for this type of data that I am looking for in the literature? Or am I looking for something that doesn't even make sense? If you think this is so, why am I looking for something meaningless? Thanks for any pointers.

EDIT: even though I wrote the above code snippets in Haskell syntax, I am not writing my application in Haskell. I use my own language and my own data types, so I'm not limited to Haskell semantics. I'm looking for a pointer to similar concepts in the literature, so when I write a report for my project, I don't seem to be inventing anything new. I tried all the Google keywords that I can think of and nothing was returned, so I ask here.

+4
functional-programming haskell ocaml
source share
4 answers

In the comment you say:

I am not sure how to encode the subtype hierarchy using GADT. If you think this is doable, could you give an answer to an example on how the type hierarchy shown in my example can be encoded?

Therefore, I give an answer to this question here. The main idea is to provide a type-level function (in the host language, here Haskell) for calculating the subtyping relationship (system type of the target language, here is your custom EDSL). For simplicity, I will fully explain the subtyping relationship, but standard level-level programming can be used to reduce repetition and increase the level of abstraction. Firstly, extensions are needed:

 {-# LANGUAGE GADTs #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE TypeFamilies #-} 

Now defining the subtyping relationship:

 data Level = Primary | Unary | Mult | Add | Comp type family Subtype ab where Subtype Primary a = True Subtype Unary Primary = False Subtype Unary a = True Subtype Mult Primary = False Subtype Mult Unary = False Subtype Mult a = True Subtype Add Add = True Subtype Add Comp = True Subtype Add a = False Subtype Comp Comp = True Subtype Comp a = False 

A closed type family is used to ensure that the subtyping relation cannot be extended by clients (your second property). Finally, GADT for target language terms can use the subtyping relation as a constraint on its constructors.

 data Expr a where ID :: Subtype Primary a ~ True => Expr a Paren :: Subtype Primary a ~ True => Expr b -> Expr a Negate :: Subtype Unary a ~ True => Expr Unary -> Expr a Times :: Subtype Add a ~ True => Expr Mult -> Expr Mult -> Expr a Plus :: Subtype Add a ~ True => Expr Add -> Expr Add -> Expr a Compose :: Subtype Comp a ~ True => Expr Comp -> Expr Comp -> Expr a 

Note that since the Paren argument is polymorphic, you will need a type annotation in the term to express what “level” of the subtype hierarchy you want it to be treated as. I would expect you to need to do this in any language you are developing. In ghci we can specify the type of the sampled term:

 :t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add))) Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add))) :: (Subtype 'Comp a ~ 'True) => Expr a 

This is more or less the type that you expect for this term, I think. You can also see that the hierarchy of expressions is strictly adhered to, although I dare say that the error message is not 100% understandable (since it is written in terms of the host language and not in the language of terms):

 :t Negate (Plus ID ID) <interactive>:1:9: Couldn't match type ''False' with ''True' Expected type: 'True Actual type: Subtype 'Add 'Unary In the first argument of 'Negate', namely '(Plus ID ID)' In the expression: Negate (Plus ID ID) 
+3
source share

There are two things that come to my mind, both for “real” subtyping systems (therefore not available in Haskell), although I'm not entirely sure that any of them meets all your requirements:

  • Explicit unmarked union types , like in Ceylon, which let you name type A | B A | B , which is a supertype of both A and B That way, you can just do the normal ADT Empty and Expr , and then declare a synonym for type OptionalExpr = Empty | Expr type OptionalExpr = Empty | Expr .

  • The ADT mode is modeled in Scala as a hierarchy of sealed features and case classes:

     sealed trait OptionalExpr case object Empty extends OptionalExpr sealed trait Expr extends OptionalExpr case class IntExpr(i: Int) extends OptionaExpr case class AddExpr(lhs: Expr, op: AddOp, rhs: Expr) extends OptionalExpr 

    Thus, OptionalExpr and Expr not extensible (since traits are sealed) and behave basically like ADT in Haskell, but you can still access the “intermediate” types, for example, in the usual inheritance hierarchy (unlike Haskell, where you only have constructors that are not types in themselves).

Both cases require that the pattern matching form matches the access values, of course, since you need to restore in which you are “part of the union”.

+2
source share

Haskell is particularly capable of modeling your domain, perhaps because it can be described using a fairly simple mathematical model. Essentially, your first point means that the subtype relation is well-order . It makes your life very simple - this model will probably easily go into any language whose type system is at least as strong as that of Haskell.

Start by defining the type (which will be filmed as) to present your options:

 data Variant = Primary | Unary | Mult | Add | Comp | Expr 

The following is a non-recursive data type for representing nodes in a language of terms:

 data ExprF (k :: Variant -> *) (x :: Variant) where ID_F :: ExprF k 'Primary Paren_F :: k 'Expr -> ExprF k 'Primary Negate_F :: k 'Primary -> ExprF k 'Unary Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp 

Recursive occurrences of terms are represented by an additional parameter. Essentially, this is just a typical representation of the polynomial functional (i.e., Fix ), but with an index.

Then your expression type:

 data Expr' (x :: Variant) where Expr' :: (x <= y) => Expr x -> Expr' y data Expr (x :: Variant) where MkExpr :: ExprF Expr' x -> Expr x 

The class <= has not yet been entered, but it represents your subtype relationship.

As mentioned earlier, your subtype relationship is a good order, and by this virtue, each element in the ordering can be assigned a unique natural number, such that a typical order in natural relations corresponds to your subtype. Or, in other words, there is an injection f : Variant -> Nat such that x is a subtype of y iff fx <= fy (or a strict subtype of iff fx < fy - this representation gives you a lot of commonality).

The necessary injection is given only by your grammar. Please note that each statement is only a "subtype" (i.e., has the right part, which the constructor should not introduce) over production over it.

 data Nat = Z | S Nat infixr 0 $ type ($) fa = fa type family VariantIx (x :: Variant) :: Nat where VariantIx 'Primary = 'Z VariantIx 'Unary = 'Z VariantIx 'Mult = $ 'Z VariantIx 'Add = $ $ 'Z VariantIx 'Comp = $ $ $ 'Z VariantIx 'Expr = $ $ $ $ 'Z 

You need an implicit subtype relation (which is <= ), but it is often much easier to work with explicit proof of the relationship, so it is typical that the implicit version simply generates an explicit proof. For this purpose, you write two announcements:

 data family (:<=:) (x :: k) (y :: k) class (<=) (x :: k) (y :: k) where isLTEQ :: x :<=: y 

Instances for naturals should be fairly obvious:

 data instance (:<=:) (x :: Nat) y where LT_Z :: 'Z :<=: n LT_S :: n :<=: m -> n :<=: m instance 'Z <= n where isLTEQ = LT_Z instance (n <= m) => n <= m where isLTEQ = LT_S isLTEQ 

and instances for Variant determine the order caused by VariantIx :

 newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ 

You probably need smart constructors. If you use the recent GHC, you will have access to template synonyms, but this is optional:

 id_ = MkExpr ID_F pattern Id = MkExpr ID_F pattern Paren e = MkExpr (Paren_F (Expr' e)) pattern Neg e = MkExpr (Negate_F (Expr' e)) infixl 6 :+ pattern (:+) ab = MkExpr (Add_F (Expr' a) (Expr' b)) infixl 7 :* pattern (:*) ab = MkExpr (Mult_F (Expr' a) (Expr' b)) pattern Cmp ab = MkExpr (Comp_F (Expr' a) (Expr' b)) 

and some simple examples:

 >Id :+ Id :+ Neg Id :* Id Add_F (Add_F ID_F ID_F) (Mult_F (Negate_F ID_F) ID_F) >Id :+ Id :* Neg (Id :* Id) <interactive>:6:13: No instance for (( $ 'Z) <= 'Z) arising from a use of `Neg' 

Note that you can also write your type of expression in a slightly different way:

 data ExprFlip (x :: Variant) where MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y 

This differs from the original in that the most external type of expression has a subtype relation applied to it - for example,

 pattern Id' = MkExprFlip ID_F 

is of type ExprFlip t , and Id :: Expr 'Primary . I do not see another way in which they differ, and I believe that this would simply be a matter of preference or the most common use cases. The initial representation has the advantage that the type of inference is always monomorphic, which in some cases can make type inference better, but does not affect the construction of expressions.


To answer your four questions:

  • This model relies on the semantics of the relationship of the subtype of design.
  • VariantIx and type Variant are closed. Any additional instances for :<=: or <= for Variant or Nat will overlap with existing ones (as common as possible), therefore, in principle, they can be defined, an attempt to use them will create type errors.
  • Essentially, you have a reflexive and transitive relation, and these properties are fixed in the instance <= for Nat once and for all. Changing the relationship of a subtype comes down only to changing Variant and VariantIx .
  • The proofs of the subtype relation are constructed according to the type of inference - the class <= . Since all indexes in the ExprF type are monomorphic, the type checker can always calculate the subtype ratio for the indexes.

Full code:

 {-# LANGUAGE StandaloneDeriving, UndecidableInstances, PatternSynonyms , TypeOperators, KindSignatures, PolyKinds, DataKinds, GADTs, TypeFamilies , MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-} data Variant = Primary | Unary | Mult | Add | Comp | Expr data ExprF (k :: Variant -> *) (x :: Variant) where ID_F :: ExprF k 'Primary Paren_F :: k 'Expr -> ExprF k 'Primary Negate_F :: k 'Primary -> ExprF k 'Unary Mult_F :: k 'Mult -> k 'Unary -> ExprF k 'Mult Add_F :: k 'Add -> k 'Mult -> ExprF k 'Add Comp_F :: k 'Add -> k 'Add -> ExprF k 'Comp data Expr' (x :: Variant) where Expr' :: (x <= y) => Expr x -> Expr' y data Expr (x :: Variant) where MkExpr :: ExprF Expr' x -> Expr x data ExprFlip (x :: Variant) where MkExprFlip :: (x <= y) => ExprF ExprFlip x -> ExprFlip y pattern Id' = MkExprFlip ID_F data Nat = Z | S Nat infixr 0 $ type ($) fa = fa type family VariantIx (x :: Variant) :: Nat where VariantIx 'Primary = 'Z VariantIx 'Unary = 'Z VariantIx 'Mult = $ 'Z VariantIx 'Add = $ $ 'Z VariantIx 'Comp = $ $ $ 'Z VariantIx 'Expr = $ $ $ $ 'Z data family (:<=:) (x :: k) (y :: k) class (<=) (x :: k) (y :: k) where isLTEQ :: x :<=: y data instance (:<=:) (x :: Nat) y where LT_Z :: 'Z :<=: n LT_S :: n :<=: m -> n :<=: m instance 'Z <= n where isLTEQ = LT_Z instance (n <= m) => n <= m where isLTEQ = LT_S isLTEQ newtype instance (:<=:) (x :: Variant) y = IsSubtype (VariantIx x :<=: VariantIx y) instance (VariantIx x <= VariantIx y) => x <= y where isLTEQ = IsSubtype isLTEQ id_ = MkExpr ID_F pattern Id = MkExpr ID_F pattern Paren e = MkExpr (Paren_F (Expr' e)) pattern Neg e = MkExpr (Negate_F (Expr' e)) infixl 6 :+ pattern (:+) ab = MkExpr (Add_F (Expr' a) (Expr' b)) infixl 7 :* pattern (:*) ab = MkExpr (Mult_F (Expr' a) (Expr' b)) pattern Cmp ab = MkExpr (Comp_F (Expr' a) (Expr' b)) instance Show (Expr' x) where showsPrec k (Expr' x) = showsPrec kx instance Show (Expr x) where showsPrec k (MkExpr x) = showsPrec kx deriving instance (Show (k 'Mult), Show (k 'Add), Show (k 'Expr), Show (k 'Primary), Show (k 'Unary)) => Show (ExprF kx) 
+2
source share

If I don’t understand, polymorphic variants can do pretty much that. However, “unlabeled union” is not an excellent term to use (I think most people would think that you are asking for C-style unions).

An example would look like this:

 type empty = [`Empty] type bin_op = Add | Sub type expr = [`Int of int | `Add of expr * bin_op * expr] type optional_expr = [empty | expr] type weird_expr = [expr | `Wierd of expr | `Zonk of string] 

Note that with OCaml polymorphic variants, the subtype relationship is determined structurally, and not between named types.

+1
source share

All Articles