I already gave +1 to Byers, but as far as I remember, this article doesn’t really talk about how the regular expression works, without explaining why one algorithm is bad and the other is much better. Maybe something in the links?
I will focus on a good approach - the creation of finite state machines. If you limit yourself to deterministic automata without minimization, this is not too complicated.
What I (very quickly) described is the approach taken in Modern compiler design .
Imagine you have the following regular expression ...
a (bc)* d
Letters represent letter characters to match. * - normal match with zero or more repetitions. The basic idea is to derive states based on dotted rules. We take State zero as a state in which nothing was matched, so the point goes ahead ...
0 : .a (bc)* d
The only possible match is 'a', so the next state we get is ...
1 : a.(bc)* d
Now we have two possibilities - match “b” (if there is at least one repeat of “bc”) or match “d” otherwise. Note. Basically we do a digraph search here (first in depth or in width, or another), but we find the digraph when we look for it. Assuming the strategy is wide, we will need to queue up one of our cases for further consideration, but I will ignore this question here. Anyway, we discovered two new conditions ...
2 : a (bc)* d 3 : a (bc)* d.
State 3 is the final state (there may be several). For state 2, we can only match “c”, but after that we must be careful with the dot. We get "a. (B c) * d" - the same as state 1, so we do not need a new state.
IIRC, the approach in Modern Compiler Design is to translate the rule when you click on the operator to simplify point processing. State 1 will be converted to ...
1 : ab c (bc)* d ad
So your next parameter should either match the first rep or skip the rep. The following states from this are equivalent to states 2 and 3. The advantage of this approach is that you can discard all your past matches (all up to "."), Since you only care about future matches. This usually gives a smaller state model (but not necessarily a minimum).
EDIT If you discard already agreed data, your state description is a representation of a set of strings that can occur from that point.
In terms of abstract algebra, this is a kind of closure of the set. An algebra is basically a collection with one (or more) operators. Our set is descriptions of states, and our operators are our transitions (character matches). A closed set is an application in which applying any operator to any member in a set always creates another element that is in the set. Closing a set is a finite set that is closed. Thus, basically, starting from the obvious start state, we build a minimal set of states that is closed with respect to our set of transition operators - the minimum set of reachable states.
Minimum here refers to the closing process - there may be smaller equivalent automata, which are usually called minimal.
Given this basic idea, it’s not so difficult to say “if I have two state machines representing two sets of strings, how do I get a third representing the union” (or intersection, or set the difference ...), Instead of dashed rules, your state representations will have a current state (or many current states) from each input automaton and, possibly, additional information.
If your regular grammar becomes complicated, you can minimize it. The basic idea here is relatively simple. You group all your states into one equivalence class or “block”. Then you repeatedly check whether the blocks need to be separated (the states are not really equivalent) with respect to a certain type of transition. If all the states in a particular block can take the correspondence of the same symbol and at the same time reach the same next block, they are equivalent.
The Hopcrofts algorithm is an effective way to deal with this basic idea.
It is especially interesting that minimization consists in the fact that each deterministic finite automaton has exactly one minimal form. In addition, the Hopcrofts algorithm will give the same representation of this minimal form, no matter what representation of the larger case it started. That is, this is a "canonical" representation that can be used to obtain a hash or for arbitrarily consistent orders. This means that you can use minimal automata as keys in containers.
The above is probably a slightly messy definition of WRT, so make sure you learn any terms yourself before using them yourself, but with little luck it gives a fair, brief introduction to the main ideas.
BTW - take a look at the rest of Dick Grunes's website - he has a free pdf book on parsing methods. The first edition of Modern Compiler Design is pretty good IMO, but as you will see, the second edition appears.