One approach that comes to mind is to create tree-like structure patterns.
Example: http://* will contain all the templates (listed above). http://*.site1.com/* will contain all site1.com . This can significantly reduce the number of patterns that need to be verified.
In addition, you can determine which patterns are mutually exclusive to further crop the list you are looking for.
So, first take all the templates and create trees from them. Find all the roots to determine which branches and nodes should be analyzed.
Improve the algorithm by determining which branches are mutually exclusive, therefore, as soon as you find a hit on this branch, you should know which branches / nodes do not need to be visited.
To get started, you can be lazy, and your first pass may be to sort the templates and make them simple. The following template contains this template type logic to determine if "this" is contained in the following. EX: if( "http://*.site1.com/*".startsWith("http://*") == true )
You can complicate your ability to determine if one template really contains another, but this will get you started.
To better understand the question:
"Does this template contain this template?"
I believe that you will need to parse the regex ... This article looks like a good place to start to figure out how to do this: Parsing regular expressions with recursive descent
source share