Given df , the next step is to group only by host value and
aggregate on idxmax . This gives you the index that corresponds to the highest value of the service. Then you can use df.loc[...] to select the lines in df that match the largest utility values:
import numpy as np import pandas as pd df_logfile = pd.DataFrame({ 'host' : ['this.com', 'this.com', 'this.com', 'that.com', 'other.net', 'other.net', 'other.net'], 'service' : ['mail', 'mail', 'web', 'mail', 'mail', 'web', 'web' ] }) df = df_logfile.groupby(['host','service'])['service'].agg({'no':'count'}) mask = df.groupby(level=0).agg('idxmax') df_count = df.loc[mask['no']] df_count = df_count.reset_index() print("\nOutput\n{}".format(df_count))
returns a dataframe
host service no 0 other.net web 2 1 that.com mail 1 2 this.com mail 2
unutbu
source share