You can search for names using dns and see if they point to the same ip. Removing confused characters may require some small string processing.
from socket import gethostbyname_ex urls = ['http://google.com','google.com/','www.google.com/','news.google.com'] data = [] for orginalName in urls: print 'url:',orginalName name = orginalName.strip() name = name.replace( 'http://','') name = name.replace( 'http:','') if name.find('/') > 0: name = name[:name.find('/')] if name.find('\\') > 0: name = name[:name.find('\\')] print 'dns lookup:', name if name: try: result = gethostbyname_ex(name) except: continue
result:
url: http://google.com dns lookup: google.com ip: 66.102.11.104 url: google.com/ dns lookup: google.com ip: 66.102.11.104 url: www.google.com/ dns lookup: www.google.com ip: 66.102.11.104 url: news.google.com dns lookup: news.google.com ip: 66.102.11.104 [('66.102.11.104', 'http://google.com'), ('66.102.11.104', 'google.com/'), ('66.102.11.104', 'www.google.com/'), ('66.102.11.104', 'news.google.com')]
Martlark
source share