I would add to answer senderle that it might make sense to somehow normalize your names (for example, remove all special characters and then apply it to webpage_text and your list of strings.
def normalize_str(some_str): some_str = some_str.lower() for c in """-?'"/{}[]()&!,.`""": some_str = some_str.replace(c,"") return some_str
If this is not enough, you can go into difflib and do something like:
for client in normalized_client_names: closest_client = difflib.get_closest_match(client_name, webpage_text,1,0.8) if len(closest_client) > 0: print client_name, "found as", closest_client[0]
The random clipping that I selected (Ratcliff / Obershelp) equal to 0.8 may be too soft or hard; play a little with him.
source share