The cover is defined in xgboost as:
the sum of the second-order gradient of the training data classified in the sheet, if it is a square loss, it simply corresponds to the number of copies in this branch. The deeper the a node tree, the lower the metric will be
https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd Not particularly well documented ....
To calculate the coverage, we need to know the predictions at this point in the tree and the second derivative with respect to the loss function .
Fortunately for us, the forecast for each data point (of which 6513) at 0-0 node in your example is .5. This is the global default setting in which your first prediction at t = 0 is .5.
base_score [default = 0.5] initial forecast estimate of all examples, global offset
http://xgboost.readthedocs.org/en/latest/parameter.html
The gradient of binary logistics (which is your target function) is py, where p = your prediction and y = true label.
So the hessian (which we need for this) is p * (1-p). Note. Hessian can be defined without y, real marks.
So (bringing him home):
6513 * (.5) * (1 -.5) = 1628.25
In the second prediction tree, at this point, not all .5, sp allows one to obtain predictions after one tree
p = predict(bst,newdata = train$data, ntree=1) head(p) [1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077 sum(p*(1-p))
Note that for linear (quadratic error) hessian regression is always the same, so the cover shows how many examples are on this sheet.
A large takeaway is that the cap is determined by the Hessian of the objective function. A lot of information on how to get to the gradient, and hessian binary logistic function.
These slides are useful because he sees why he uses hessians as a weight, and also explains how xgboost differs from standard trees. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf