What does get_fscore () of the xgboost ML model do?

Anyone how numbers are calculated? The documentation says that this function is “Get the importance value of each function”, but there is no explanation of how to interpret the results.

+7
python xgboost feature-selection
source share
1 answer

This is a metric that simply sums up how many times each function is shared. This is similar to the frequency metric in version R. https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

It’s like a basic metric of importance you can get.

i.e. How many times has this variable been shared?

The code for this method shows that this is simply adding the presence of this function to all trees.

[here .. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953†►1]

def get_fscore(self, fmap=''): """Get feature importance of each feature. Parameters ---------- fmap: str (optional) The name of feature map file """ trees = self.get_dump(fmap) ## dump all the trees to text fmap = {} for tree in trees: ## loop through the trees for line in tree.split('\n'): # text processing arr = line.split('[') if len(arr) == 1: # text processing continue fid = arr[1].split(']')[0] # text processing fid = fid.split('<')[0] # split on the greater/less(find variable name) if fid not in fmap: # if the feature id hasn't been seen yet fmap[fid] = 1 # add it else: fmap[fid] += 1 # else increment it return fmap # return the fmap, which has the counts of each time a variable was split on 
+3
source share

All Articles