Python Pandas: lookup table by substring lookup

I have a dataframe with a column for user agents in the application. What I need to do is determine the specific application from this column. For example,

NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15 Darwin/14.0.0will be categorized Words With Friends.

iPhone3,1; iPhone OS 7.1.2; com.fingerarts.sudoku2; 143441-1,24 will be Sudoku by FingerArts etc.

I will have another framework with strings that I need to map. For example,

Keyword                 Game 
NewWordsWithFriends     Words With Friends
com.fingerarts.sudoku   Sudoku by FingerArts

How to do a search like this for pandas dataframe? For example, a data frame is similar to

user    date                 user-agent
 A      2015-09-02 13:45:56  NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15 Darwin/14.0.0
 B      2015-08-31 23:04:21  iPhone3,1; iPhone OS 7.1.2; com.fingerarts.sudoku2; 143441-1,24

After searching, I want to create a new column GameName.

+4
source share
2 answers

One possible way to achieve this could be:

import pandas as pd                                                              

# some example data
qry = pd.DataFrame.from_dict({"Keyword": ["NewWordsWithFriends",                 
                                          "com.fingerarts.sudoku"],              
                              "Game": ["Words With Friends",                     
                                       "Sudoku by FingerArts"]})                 

df = pd.DataFrame.from_dict({"user-agent" : ["NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15 Darwin/14.0.0",     
                                             "iPhone3,1; iPhone OS 7.1.2; com.fingerarts.sudoku2; 143441-1,24"]})

keywords = qry.Keyword.tolist()                                                  
games = qry.Game.tolist()                                                        

def select(x):                                                                   
    for key, game in zip(keywords, games):                                       
        if key in x:                                                             
            return game                                                          

df["GameName"] = df["user-agent"].apply(select)  

This will give:

In [41]: df
Out[41]: 
                                          user-agent              GameName
0  NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15...    Words With Friends
1  iPhone3,1; iPhone OS 7.1.2; com.fingerarts.sud...  Sudoku by FingerArts

, , .

, , , , :

, .apply, , , select() ..

, line_profiler (. python ?).

+1
df = pd.DataFrame({'date' : ['2015-09-02 13:45:56' , '2015-08-31 23:04:21'] , 'user-agent' : ['NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15 Darwin/14.0.0' , 'iPhone3,1; iPhone OS 7.1.2; com.fingerarts.sudoku2; 143441-1,24']  })

map_df = pd.DataFrame({'Keyword' :  ['NewWordsWithFriends' , 'com.fingerarts.sudoku'], 'Game' : [ 'Words With Friends', 'Sudoku by FingerArts'] })

mapping = {vals[1] : vals[0] for vals in  map_df.values}


regex = '|'.join([keyword.replace('.' , '\.') for keyword in map_df['Keyword']])

def get_keyword(user_agent):
    matches = re.findall(regex ,user_agent)
    return matches[0] if len(matches) > 0 else np.nan


df['GameName'] = df['user-agent'].apply(get_keyword)

df['GameName'] = df['GameName'].map(mapping)

get_keyword

def get_keyword(user_agent):
    for keyword in map_df['Keyword']:
        if keyword in user_agent:
            return keyword

series

mapping = pd.Series(map_df['Game'].values , index = map_df.Keyword )
+1

All Articles