Creating and comparing sets in Coq

I had problems understanding whether it is possible to prove that the two sets (in this case, ordinary languages) are identical and, therefore, interchangeable. From what I understand, sets can be equivalent, even if they are not structurally equal.

Regular languages โ€‹โ€‹are sets of lines, but I donโ€™t see how to say that r1 = r2, so that in the proof one could use something like symmetry.

Here is my RegularLanguage expression:

Inductive RegularLanguage (A : Set) : Set := | EmptyLang : RegularLanguage A | EmptyStr : RegularLanguage A | Unit : A -> RegularLanguage A | Union : RegularLanguage A -> RegularLanguage A -> RegularLanguage A | Concat : RegularLanguage A -> RegularLanguage A -> RegularLanguage A | KleeneStar : RegularLanguage A -> RegularLanguage A . 

Although I say the type is Set, this does not create a set of strings as far as I can see. I need some function like RegularLanguage -> Set of strings ?

Thanks for the help.

+5
source share
2 answers

There are a few misunderstandings regarding some Coq concepts that I will try to clarify.

First, in Coq, you should not consider Set as what we call "set" in traditional math. Instead, you should look at it as a type. Coq also has Type , but for the purpose of this post, you can view both Set and Type as interchangeable. To avoid confusion, from now on I will use โ€œsetโ€ to refer to the usual concept of typing in traditional mathematics and โ€œtypeโ€ to refer to Set and Type elements in Coq.

So what is the difference between set and type? Well, in ordinary math, it makes sense to ask yourself if something is a member of any given set. Thus, if we developed the theory of regular expressions in normal mathematics, and each regular expression was considered as a set, it would be advisable to ask questions such as 0 \in EmptyLang , because although 0 is a natural number, it can be an element of any set a priori. As a less far-fetched example, an empty string is both a member of the full language (i.e., the one that contains all the lines) and the closure of Kleene of any base language.

In Coq, on the other hand, every valid element of a language has exactly one type. For example, an empty string is of type list A for some A , which is written [] : list A If we try to ask if [] belongs to some regular language using the syntax "has type", we get an error; try typing for example.

 Check ([] : EmptyLang). 

Both sets and types can be considered as collections of elements, but types are in a sense more restrictive: for example, you can take the intersection of two sets, but there is no way to take the intersection of two types.

Secondly when you write

 Inductive RegularLanguage (A : Set) : Set := (* ... *) 

this does not mean that the elements you list below the header define sets and types. This means that for each type A (part (A : Set) ) you define a new type marked RegularLanguage A (part RegularLanguage (A : Set) : Set ), the elements of which are freely generated by the list of specified constructors. Thus, we have

 EmptyLang : RegularLanguage nat RegularLanguage nat : Set 

but we do not have

 EmptyLang : Set 

(once again, you can try entering all of the above judgments into Coq to see which ones are accepted and which are not).

Being โ€œfreely generatedโ€ means, in particular, that the constructors you cited are injective and disjoint. As Larsre noted earlier, in particular, this is not the case when Union l1 l2 = Union l2 l1 ; in fact, you can usually prove Union l1 l2 <> Union l2 l1 . The problem is that there is a mismatch between the concept of equality that you get for inductively defined types in Coq (which you cannot change) and your supposed concept of equality for ordinary languages.

Although there are several ways around this, I find the easiest way to use the function .

+6
source

Two different elements in a RegularLanguage A set RegularLanguage A not equal if they are built from different constructors. Consider L1 = ( a | b ) and L2 = ( b | a) . Here L1 <> L2 , because the arguments to the constructors are different.

In your designations, L1 can be Union nat (Unit nat 1) (Unit nat 2)) , and L2 can be Union nat (Unit nat 2) (Unit nat 1)) .

Instead, you want to say that there is a regexp_match : A -> RegularLanguage A -> list A -> bool function regexp_match : A -> RegularLanguage A -> list A -> bool , which corresponds to lines in a given language. You will need to perform this function.

Two regular expressions are equal if they match and reject the same lines. For example, for L1 and L2 :

 Lemma L1_eq_L2 : forall S, regexp_match L1 S = regexp_match L2 S. 

If you can prove it, you can use it to replace regexp_match L1 S with regexp_match L2 S and vice versa wherever you want.

+1
source

All Articles