The “ejection” in the terminology of the “box and mustache” graphs is any point in the data set that falls beyond a given distance from the median, usually about 2.5 times the difference between the median and 0.25 (lower) or 0.75 ( upper) quantile. To get there, see ?boxplot.stats : first, look at the definition of out on the output
out : the values of any data points that lie outside the extremes of the mustache ( if(do.out) ).
These are "emissions".
Secondly, look at the definition of whiskers, which are based on the coef parameter, which is 1.5 by default:
the mustache extends to the most extreme data point, which is no more than coef times the length of the window.
Finally, look at the definition of “hinges,” which are the ends of a field:
Two “hinges” are versions of the first and third quartiles, i.e. close to quantiles (x, s (1,3) / 4).
Put them together and you will get outliers defined (approximately) as points that are farther from the median than 2.5 times the distance between the median and the corresponding quartile. The reasons for these somewhat confusing definitions (I think) are partly historical and partly the desire to have chart components reflect the actual values that are present in the data (and not, say, halfway between two data points) as much as possible. (You will probably need to return to the original literature listed on the help page for complete excuses and explanations.)
The thing to keep in mind is that points defined as “outliers” of this algorithm are not necessarily outliers in the usual statistical sense (for example, points that are surprisingly extreme are based on a specific statistical data model) / STRONG>. In particular, if you have a large data set, you are sure to see a lot of “outliers” (one sign that you can switch to a more flexible graphical summary, such as a treble clef or beanplot).
source share