Is testing equality between two infinite data structures possible in Haskell?

Question

Is testing equality between two infinite data structures possible in Haskell?

In a project I'm working on, data of a certain type can sometimes contain in itself. For example,

data Example = Apple Int | Pear Int Example a = Pear 10 a b = Pear 10 b

As a programmer, I know that a and b are equal, but when I actually check the equality between them, it will be an infinite loop, because their values must be evaluated for comparison.

Is there any other way to check for equality between data like this? Or is there a way to avoid such problems?

+7

haskell lazy-evaluation infinite

Enzo Jan 30 '15 at 19:34

source share

2 answers

I would like to talk a little about how to do it right, if this is what you want to do. This is harder than you might have guessed at the beginning - and easier than you could have guessed for a second! For fun, I'm going to make the problem a little more complicated; we will ultimately represent the values

 a = Pear 10 a b = Pear 10 (Pear 10 b) c = Apple 10

and the calculation that a and b are "equal" - for the exact meaning of equality we will discuss below. This is not what observable sharing will definitely give you for free.

The exact meaning of equality, which we will use in the future, is monotonous. As a rule, ambiguity - and its close relative, bisimulation - is presented as the relationship between two marked graphs. For our purposes, you should present here the nodes in the graph that contain some data in the current constructor of our data type, and edges that indicate sub-terms. So for the value of Pear 10 (Pear 20 (Apple 30)) we could have a graph of Pear 10 -> Pear 20 -> Apple 30 ; and the value of b above would be a cyclic graph

 Pear 10 -> Pear 10 ^ | \_______/

The observed exchange will allow you to bring it to the end, but it will not bring you to the extent that it is obvious that these two graphs are equivalent:

 Pear 10 -. Pear 10 -> Pear 10 ^ | ~= ^ | \____/ \_______/

If you are familiar with the algorithm used to minimize DFA, you can probably stop here; such algorithms are easily adaptable in order to verify the equality of regular languages and what we will do below.

The key insight is that all three nodes in the two graphs above behave essentially the same: any path you can make, starting with node in the left graph, has an "identical" path in the right graph. That is, suppose that we have an R relation between nodes in the left and right graphs that satisfy this property:

 if nl R nr and there is an edge (nl, e, nl') in the graph, then there is an edge (nr, e, nr') in the graph and nl' R nr'

We will call R bisimulation. The largest ratio R will be called bipolar. If the “root” nodes in the two graphs are similar, then the corresponding Haskell values are equal! At the moment, I hope that you have reached such an extent that the problem seems more complicated than you expected; possibly impossible. In the end, how should we take the greatest of such an attitude?

One answer is to start with a complete relationship and cut out any pairs of nodes that violate the above restrictions. Keep repeating this process until nothing changes, and see what we have left. It turns out you can prove that this process actually generates duplicity? We realize this below in a naive way; you can use Google for efficient bisimality algorithms if you want to increase speed.

First preamble. We will use the fgl package for our presentation of the graph.

 import Control.Monad.Reader import Data.Graph.Inductive hiding (ap) import Data.Map (Map) import Data.Set (Set) import qualified Data.Map as M import qualified Data.Set as S

The fgl package defines a Node type for Node identities. We will represent our attitude simply as

 type Relation = Set (Node, Node)

First, we want to get the full ratio for a pair of graphs. While we are doing this, we could also cut out any pairs whose labels of the node do not match immediately. (Note on the naming conventions that I have chosen: in fgl, each node and edge has a label that can be of any type you like, and an identifier that should be of type Node or Edge Our names reflect this when possible: prefix n for nodes, e for edges, i for identity and v for label / value. We will use l and r as a suffix for our left and right graphs.)

 labeledPairs :: (Eq n, Graph gr) => gr ne -> gr ne' -> Relation labeledPairs lr = S.fromList [ (nil, nir) | (nil, nvl) <- labNodes l , (nir, nvr) <- labNodes r , nvl == nvr ]

Now, the next fragment should check if the two nodes are satisfied with the “one-step proximity” condition described above. That is, for each edge from one of the nodes, we look for the edge from the other with the same label and leading to another node, which, as we now say, is connected. Transliterating this search into Haskell:

 -- assumption: nil and nir are actual nodes in graphs l and r, respectively ssRelated :: (Ord e, Graph gr) => gr ne -> gr ne -> Relation -> Node -> Node -> Bool ssRelated lr rel nil nir = rell && relr where el = out l nil er = out r nil mel = M.fromListWith (++) [(evl, [nil]) | (_, nil, evl) <- el] mer = M.fromListWith (++) [(evr, [nir]) | (_, nir, evr) <- er] rell = and [ or [(nil, nir) `S.member` rel | nir <- M.findWithDefault [] evl mer] | (_, nil, evl) <- el ] relr = and [ or [(nil, nir) `S.member` rel | nil <- M.findWithDefault [] evr mel] | (_, nir, evr) <- er ]

Now we can write a function that checks each pair of nodes for a single-stage suitability:

 prune :: (Ord e, Graph gr) => gr ne -> gr ne -> Relation -> Relation prune lr rel = S.filter (uncurry (ssRelated lr rel)) rel

To calculate the double-valuedness, as mentioned above, we will start with a full relationship and repeatedly delete nodes that do not meet the criteria.

 bisimilarity :: (Eq n, Ord e, Graph gr) => gr ne -> gr ne -> Relation bisimilarity lr = fst . head . dropWhile (uncurry (/=)) . ap zip tail . iterate (prune lr) $ labeledPairs lr

Now we can check whether the two graphs have the same infinite spread, selecting the root nodes in each graph and checking them for one-sided:

 -- assumption: the root of the graph is node 0 bisimilar :: (Eq n, Ord e, Graph gr) => gr ne -> gr ne -> Bool bisimilar lr = (0, 0) `S.member` bisimilarity lr

Now look at it in action! We will make analogues a , b and c from earlier in the answer in a graphical representation. Since our data type only ever has one possible recursive occurrence, we do not need interesting border labels. The mkGraph function takes a list of marked nodes and a list of marked edges and displays a graph from them.

 data NodeLabel = Apple Int | Pear Int deriving (Eq, Ord, Read, Show) type EdgeLabel = () a, b, c :: Gr NodeLabel EdgeLabel a = mkGraph [(0, Pear 10)] [(0, 0, ())] b = mkGraph [(0, Pear 10), (1, Pear 10)] [(0, 1, ()), (1, 0, ())] c = mkGraph [(0, Apple 10)] []

In ghci:

 *Main> bisimilar ab True *Main> bisimilar ac False *Main> bisimilar bc False *Main> bisimilar aa True *Main> bisimilar bb True *Main> bisimilar cc True

Looks nice! A quick and quick connection to a library with an observed section is left as an exercise for the reader. And keep in mind that although this method can handle graphs with infinite expansions, of course, there will probably be a problem with processing infinite graphs this way!

+10

Daniel Wagner Jan 31 '15 at 4:44

source share

Tikhon jelvis · Accepted Answer · 2015-01-30T19:48:37+0000

In general: no. Such testing of equality comes down to the problem of stopping. One way to see this is that we could express the execution of a Turing machine at some input as an infinite data structure like this. Another way to look at this is that lazy data structures can represent arbitrary paused calculations.

The only real way to avoid such problems is to either set some additional restrictions on your data structure, or check for something more limited than equality.

An example of the first approach would be to make the loops in your data type explicit using some kind of reference, allowing you to detect them as you go. This will limit what you can express with your data structure, but it will also allow you to safely define equality predicate cycles. I think you could do it with observable separation ; look at data-reify for a package that does this. This approach should make checking a very recursive structure, as your example, simple.

For the second approach, you may have a function that returns Maybe Bool : if it cannot determine whether the two structures are equal or not in steps X, it returns Nothing . Depending on how your data types are created, you can probably make sure that any types outside a certain size with the same prefix are almost always equal and just rely on it.

Is testing equality between two infinite data structures possible in Haskell?

More articles: