Some regex mechanisms (for example, PCRE) are constructed (?|...) . This is similar to a non-capture group, but has a nice feature that in each rotation group is taken into account from the same initial value. This will probably solve your problem right away. Therefore, if switching the language for this task is an option for you, this should do the trick.
[ edit: Actually, this will still cause problems with conflicting capture groups. In fact, the template will not even compile, since group names cannot be reused.]
Otherwise, you will have to manipulate input patterns. hyde proposed renumbering backlinks, but I think there is a simpler option: make all groups named groups. You can make sure the names are unique.
So, for each input template, you create a unique identifier (for example, increase the ID). Then the hardest part is finding the capture groups in the template. You cannot do this with regular expression. You will have to parse the template yourself. Here are some thoughts on what to look for if you just repeat the pattern line:
- Pay attention when you enter and leave a character class, since there are literal characters in the parentheses of the character classes.
- Perhaps the hardest part: ignore all open parentheses followed by
?: , ?= , ?! , ?<= ?<! , ?> . In addition, there are brackets for setting parameters: (?idmsuxU-idmsuxU) or (?idmsux-idmsux:somePatternHere) , which also do not (?idmsux-idmsux:somePatternHere) anything (of course, there can be any subset of these parameters, and they can be in any order - - also is optional). - Now you need to leave only the opening parentheses, which are either a regular capture group or a named one:
(?<name> . It would be easiest to treat them anyway - that is, have both a number and a name (where the name is equal to the number, if it wasnโt installed.) Then you rewrite all those who have something like (?<uniqueIdentifier-md5hashOfName> (the hyphen cannot actually be part of the name, you just get your increasing number followed by a hash) , since the hash has a fixed length, there will be no duplicates; to a large extent with at least.) Be sure to remember what number and name of the group was originally. - Whenever you encounter a backslash, there are three options:
- The next character is a number. You have numbered feedback. Replace all these numbers with
k<name> , where name is the new group name generated for the group. - The following characters are:
k<...> . Replace this with the appropriate new name again. - The next character is something else. Skip it. This allows you to slip away from parentheses and speed up the backslash at the same time.
- I think Java can allow direct links. In this case, you need to go through two passes. Take care of renaming all groups. Then change all the links.
Once you have done this on each input template, you can safely combine all of them with | . Any function other than backlinks should not create problems with this approach. At least for now, your templates are valid. Of course, if you have inputs a(b and c)d , then you have a problem. But you will always have this, if you do not verify that the templates can be compiled independently.
Hope this gave you a pointer in the right direction.
source share