Search and replace in pandas data for a large data set

I have a 1 million dataset and a dataframe type.

Id description
 1 bc single phase acr
 2 conditioning accum
 3 dsply value ac
and a dictionary of size 2927, which looks like this:
Key value
accum accumulator
bb baseboard
dsply display

executed the following code to replace the dictionary key found in the dataframe with its value

dataset=dataset.replace(dict, regex=True)

but it will consume more time for excecute ie 104.07914903743769 sec for 2000 dataset and have 8 GB of RAM I need to apply this code for millions of data. so can anyone tell me how to shorten the execution time? and is there also an alternative way to accomplish this task?

+6
2

15%.

. @unutbu .

import pandas as pd
import re

rep_dict = {'accum': 'accumulator', 'bb': 'baseboard', 'dsply': 'display'}
pattern = re.compile("|".join([re.escape(k) for k in rep_dict.keys()]), re.M)

def multiple_replace(string):    
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

df = pd.DataFrame({'description': ['bc single phase acr', 'conditioning accum', 'dsply value ac']})
df = pd.concat([df]*10000)

%timeit df['description'].map(multiple_replace)          # 72.8 ms per loop
%timeit df['description'].replace(rep_dict, regex=True)  # 88.6 ms per loop
+1

, . ( MapReduce). : , , .

0

All Articles