Nullability (regular expressions)

In Brzozowski's “Derivatives of Regular Expressions” and elsewhere, the function δ (R) returns λ if R is valid, and ∅ otherwise includes the following:

δ(R1 + R2) = δ(R1) + δ(R2) δ(R1 · R2) = δ(R1) ∧ δ(R2) 

It is clear that if both R1 and R2 are nullable, then (R1 · R2) is NULL, and if R1 or R2 is NULL, then (R1 + R2) is null. However, it is not clear to me what the above provisions should indicate. My first thought, juxtaposing (+), (·) or Boolean operations with regular sets is pointless, since in the basic case

 δ(a) = ∅ (for all a ∈ Σ) δ(λ) = λ δ(∅) = ∅ 

and λ is not a set (and is not a return type of δ, which is a regular expression). In addition, this display is not indicated, and there is a separate entry for it. I understand the uncertainty, but I am lost in determining the sum, product, and Boolean operations in the definition of δ: how does λ or ∅ return from δ (R1) ∧ δ (R2), for example, in the definition of off δ (R1 · R2)?

+6
regex nullable derivative
source share
3 answers

I think you find yourself in the notational freedoms taken by the author. The inverse type δ (R) is, of course, a collection, or rather, a language. If you look at the definition:

alt text

you can see that there is inconsistency in the return type, formally λ is an element, but ∅ is an empty language ... What should he say:

alt text

The fact that the author uses λ both for an empty string and for a language containing only an empty string is once again confirmed by his definition of the Klein star operator:

alt text

Obviously, the last part should be alt text if we want to be pedantic.

Given that the return type δ (R) is a set or, rather, a language, the equations that you give make sense and express what you described.

+2
source share

I think you were right to display + and ^ in boolean or and and respectively. It looks like the two lines that you specified relate to rotation (sum) and concatenation (product):

 δ(R1 + R2) = δ(R1) + δ(R2) 

Changing R1 and R2 is valid if R1 is valid, R2 is NULL, or both R1 and R2 are NULL.

 δ(R1 · R2) = δ(R1) ∧ δ(R2) 

The concatenation of R1 and R2 is only NULL if the values ​​of R1 and R2 are null.

See here for the implementation of these rules in Haskell.

+3
source share

(I can’t look into Brzhozovsky’s article to better understand what is there), but I can offer two ways to interpret this notation (apart from the fact that with the notation, I see no doubt: the meaning of this definition is well understood):

1) On the left side of the definition, we have only “syntactic” patterns for regular expressions. On the right we produce sets; remember that a regular expression is a way to denote a language (set), and therefore this way of writing a definition becomes clear: on the right, we just use some (simple) regular expressions as a shortcut to refer to sets. Ie, ∅ means an empty language (empty set), and λ (if it is interpreted as a regular expression) means a language containing only an empty word (a set with this element).

Operations are simply operations on sets: union and intersection are possible.

If the notation is interpreted in this way, there is no contradiction with the notation used to define the underlying case: again, “a” is a regular expression, which means that it means a language with the word “a”.

2) First, we create the correct expressions on the right side, but the author expanded the operations that build regular expressions with a wedge that has the semantics of language intersection.

+2
source share

All Articles