What are the problems with ADT encoding that associates types with data constructors? (For example, Scala.)

Question

What are the problems with ADT encoding that associates types with data constructors? (For example, Scala.)

In Scala, algebraic data types are encoded as hierarchies of sealed sibling types. Example:

 -- Haskell data Positioning a = Append | AppendIf (a -> Bool) | Explicit ([a] -> [a])

 // Scala sealed trait Positioning[A] case object Append extends Positioning[Nothing] case class AppendIf[A](condition: A => Boolean) extends Positioning[A] case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

With case class es and case object s, Scala generates a bunch of things like equals , hashCode , unapply (used by pattern matching), etc., which brings us many key properties and features of traditional ADTs.

There is one key difference: In Scala, data constructors have their own types . Compare the following two examples (copied from the corresponding REPL).

 // Scala scala> :t Append Append.type scala> :t AppendIf[Int](Function const true) AppendIf[Int] -- Haskell haskell> :t Append Append :: Positioning a haskell> :t AppendIf (const True) AppendIf (const True) :: Positioning a

I have always considered the Scala option on the good side.

After all, there is no type information . AppendIf[Int] , for example, is a subtype of Positioning[Int] .

 scala> val subtypeProof = implicitly[AppendIf[Int] <:< Positioning[Int]] subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

In fact, you get an additional compile time parameter relative to the value . (Can this limited version of the dependent set be called?)

This may be useful to use. Once you know which data constructor was used to create the value, the corresponding type can propagate through the rest of the stream to add additional type safety. For example, Play JSON, which uses this Scala encoding, will allow you to extract fields from a JsObject , and not from any arbitrary JsValue .

 scala> import play.api.libs.json._ import play.api.libs.json._ scala> val obj = Json.obj("key" -> 3) obj: play.api.libs.json.JsObject = {"key":3} scala> obj.fields res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3)) scala> val arr = Json.arr(3, 4) arr: play.api.libs.json.JsArray = [3,4] scala> arr.fields <console>:15: error: value fields is not a member of play.api.libs.json.JsArray arr.fields ^ scala> val jsons = Set(obj, arr) jsons: scala.collection.immutable.Set[Product with Serializable with play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

In Haskell, fields will probably be of type JsValue -> Set (String, JsValue) . This means that it will not work at runtime for JsArray , etc. This problem also manifests itself in the form of well-known accessories for partial recording.

That Scala data constructor processing was incorrect was repeatedly expressed - on Twitter, mailing lists, IRC, SO, etc. Unfortunately, I have no links to any of these, except for a couple - this answer is from Travis Brown and Argonaut , a purely functional JSON library for Scala.

Argonaut deliberately uses the Haskell approach (via private case classes and provides manual data constructors). You can see that the problem that I mentioned with Haskell encoding also exists with Argonaut. (In addition, Option used to indicate bias.)

 scala> import argonaut._, Argonaut._ import argonaut._ import Argonaut._ scala> val obj = Json.obj("k" := 3) obj: argonaut.Json = {"k":3} scala> obj.obj.map(_.toList) res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3))) scala> val arr = Json.array(jNumber(3), jNumber(4)) arr: argonaut.Json = [3,4] scala> arr.obj.map(_.toList) res7: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = None

I have been thinking about this for quite some time, but still do not understand what makes Scala encoding incorrect. I am sure that this sometimes interferes with the type of output, but this does not seem like a reason strong enough to mislead it. What am I missing?

+67

scala functional-programming haskell playframework argonaut

missingfact Aug 15 '14 at 16:30

source share

1 answer

Daniel Spiewak · Answer 1 · 2014-08-15 21:18

As far as I know, there are two reasons why Scala idiomatic coding of case classes can be bad: output type and type specificity. The former is syntactic convenience, while the latter is an extended discussion.

The subtyping problem is relatively simple to illustrate:

 val x = Some(42)

Type x turns out to be Some[Int] , which is probably not what you wanted. You can create similar problems in other, more problematic areas:

 sealed trait ADT case class Case1(x: Int) extends ADT case class Case2(x: String) extends ADT val xs = List(Case1(42), Case1(12))

Type xs - List[Case1] . This is basically guaranteed by what you want. To work around this problem, containers such as List must be covariant in their type parameter. Unfortunately, covariance introduces a number of problems and actually worsens the stability of some designs (for example, Scalaz compromises its Monad type and several monad transformers, allowing covariant containers, although this is unreasonable).

Thus, coding ADT in this way has a somewhat vibro effect on your code. Not only do you need to deal with subtypes in ADT itself, but every container you ever write should take into account the fact that you land on subtypes of your ADT at inopportune times.

The second reason not to encode your ADTs using publicly available class classes is to avoid cluttering your type space with non-types. From a certain point of view, ADT cases are not really types: they are data. If you reason about ADT in this way (which is not so!), Then having first-class types for each of your ADT cases increases the set of things that you need to carry in your mind to reason about your code.

For example, consider the ADT algebra from above. If you want to talk about code that uses this ADT, you need to constantly think about "well, what if this type is Case1 ?" This is simply not a question to really ask, since Case1 is data. This is a tag for a specific coproduct case. It's all.

Personally, I do not care about any of the above. I mean, carelessness problems with covariance are real, but I usually prefer my containers to be invariant and instruct my users to “suck it and annotate your types”. This is inconvenient and awkward, but I find the alternative preferable in which there are many template folds and lowercase data constructors.

As a wildcard, the third potential drawback of this type specificity is that it encourages (or rather allows) a more “object-oriented” style, where you put case-specific functions for individual ADT types. I think there are very few questions that mixing your metaphors (case classes vs subtype polymorphism) this way is a recipe for the bad. However, regardless of whether this result is a mistake in typed cases, this is some open-ended question.

What are the problems with ADT encoding that associates types with data constructors? (For example, Scala.)

More articles: