Based on your comment above, I'm going to rethink the question - instead of making a regular expression that matches them, we will create a function that will match them, and use this function to filter the domain list; names include only first-class domains, for example google .com, amazon.co.uk.
First we need a TLD list. As Greg noted, the list of state suffixes is a great place to start. Suppose you suffixes list into a python array called suffixes . If you don't like this, comment and I can add code that will do this.
suffixes = parse_suffix_list("suffix_list.txt")
Now we need a code that identifies whether a given domain name matches the some-name.suffix pattern:
def is_domain(d): for suffix in suffixes: if d.endswith(suffix):
Benson Jul 08 '10 at 21:41 2010-07-08 21:41
source share