This code is full of linear searches. No wonder it works slowly. Without knowing more about input, I cannot give you advice on how to fix these problems, but I can at least point out problems. I will notice the main problems and a couple of minor ones.
udict = {} for line in infile1: line = line.strip() linelist = line.split('\t') udict1 = {linelist[0]:linelist[1]} udict.update(udict1)
Do not use update here; just add an item to the dictionary:
udict[linelist[0]] = linelist[1]
This will be faster than creating a dictionary for each entry. (And in fact, the Sven Marnach generator-based approach to creating this dictionary is even better.) It's pretty minor though.
mult10K = [] for x in range(100): mult10K.append(x * 10000)
This is completely unnecessary. Delete it; I will show you one way to print at intervals without this.
linecounter = 0 for line in infile2: for key, value in udict.items():
This is your first big problem. You do a linear dictionary search for keys in a row for each row. If the dictionary is very large, this will require a huge number of operations: 100,000,000 * len (udict).
matches = line.count(key)
This is another problem. You are looking for matches using a linear search. Then you do replace , which does the same linear search! You do not need to check compliance; replace simply returns the same string if it is not. It won't make much difference either, but it will bring you something.
line = line.replace(key, value)
Keep doing these replacements, and then just write a line as soon as all the replacements are completed:
outfile.write(line + '\n')
And finally
linecounter += 1 if linecounter in mult10K:
Forgive me, but this is a funny way to do it! You do a linear search through linecounter to determine when to print the line. And here it adds almost 100,000,000 * 100 operations. You should at least search in the set; but the best approach (if you really have to do this) was to do the modulo operation and test it.
if not linecounter % 10000: print linecounter print (datetime.now()-startTime)
To make this code effective, you need to get rid of these linear searches. Sven Marnach's answer offers one way that might work, but I think it depends on the data in your file, since replacement keys may not match obvious word boundaries. (When using regex, he added addresses, though.)