Explain the quantile () function in R

I was puzzled all day by the function of the quantile R.

I have an intuitive idea of ​​how quantiles work, and MS in statistics, but a boy, a boy, the documentation for him is confusing for me.

From the docs:

Q [i] (p) = (1 - gamma) x [j] + gamma x [J + 1],

I'm still with him. For a quantile of type i, this is the interpolation between x [j] and x [j + 1], based on some mysterious constant gamma

where 1 <= i <= 9, (jm) / n <= p <(jm + 1) / n, x [j] is the jth order of statistics, n is the sample size, m is a constant determined by the quantile type sample . Here the gamma depends on the fractional part g = np + mj.

So how to calculate j? m?

For continuous sample quantile types (from 4 to 9), quantile sample can be obtained by linear interpolation between the kth order of statistics and p (k):

p (k) = (k - alpha) / (n - alpha - beta + 1), where α and β are constants of type. Further, m = alpha + p (1 - alpha - beta) and gamma = g.

Now I'm really lost. p, which was a constant before, now seems to be a function.

So, for type 7 quanta, the default value is ...

Type 7

p (k) = (k - 1) / (n - 1). In this case, p (k) = mode [F (x [k])]. This is used by S.

Anyone want to help me? In particular, I am confused by the notation of p as a function and constant, what is the value of heck m, and now to calculate j for some specific p.

I hope that based on the answers here we can present some revised documents that better explain what is happening here.

quantile.R source code or enter: quantile.default

+57
math r statistics
Sep 18 '08 at 17:59
source share
2 answers

You understand confused. This documentation is terrible. I had to return to an article based on it (Hyndman, RJ, Fan, Y. (November 1996). “Examples of quantiles in statistical packages.” American statistician 50 (4): 361-365. Doi: 10.2307 / 2684934 ) to understand. Let's start with the first problem.

where 1 <= i <= 9, (jm) / n <= p <(jm + 1) / n, x [j] is the statistics of the jth order, n is the sample size, and m is a constant determined by the type of quantization sample. Here the gamma depends on the fractional part g = np + mj.

The first part comes directly from the article, but what the documentation authors skipped was j = int(pn+m) . This means that Q[i](p) depends on only two order statistics closest to being p part of the path through (sorted) observations. (For those like me who are not familiar with the term, “order statistics” of a series of observations is a sorted series.)

In addition, this last sentence is simply incorrect. He must read

Here the gamma depends on the fractional part np + m, g = np + mj

As for m , it is simple. m depends on which of the 9 algorithms was chosen. So, as Q[i] is the quantile function, m should be considered m[i] . For algorithms 1 and 2, m is 0, for 3, m is -1/2, and for the rest, in the next part.

For continuous samples of quantile types (from 4 to 9), sample quanta can be obtained by linear interpolation between statistics of the kth order and p (k):

p (k) = (k - alpha) / (n - alpha - beta + 1), where α and β are constants determined by type. In addition, m = alpha + p (1 - alpha - beta) and gamma = g.

This is really confusing. That the documentation calls p(k) does not match the previous version of p . p(k) is the construction position . In the article, the authors write it as p k , which helps. Moreover, in the expression for m p is the original p and m = alpha + p * (1 - alpha - beta) . Conceptually for algorithms 4–9, points ( p k , x[k] ) are interpolated to obtain a solution ( p , Q[i](p) ). Each algorithm differs only in the algorithm for p k .

Regarding the last bit, R simply indicates that it is using S.

The original article provides a list of 6 “desirable properties for the sample quantization function” and indicates that preference # 8, which meets all the requirements of 1. # 5, satisfies all of them, but they don’t like it on the other (this is more phenomenological than based on principles). # 2 - this is what non-static scum, like me, will consider quantiles and this is what is described on Wikipedia.

By the way, in response to dreeves answer , Mathematica does things differently. I seem to understand cartography. While Mathematica is easier to understand, (a) it’s easier to shoot in the leg with meaningless parameters, and (b) it cannot execute the R # 2 algorithm. (Here's the Mathworld Quantile page , which says that Mathematica cannot do # 2, but gives a simpler generalization of all other algorithms in terms of four parameters.)

+50
Sep 22 '09 at 23:58
source share

There are various ways to calculate quantiles when you give it a vector, and do not have a known CDF.

Consider the question of what to do if your observations do not exactly fall into quantiles.

Types simply determine how to do this. Thus, the methods say: “use linear interpolation between statistics of the kth order and p (k)”.

So what is p (k)? One guy says, “OK, I like to use k / n.” Another guy says, “I like to use (k-1) / (n-1),” etc. Each of these methods has different properties that are better suited to one or another problem.

\ alpha and \ beta are just ways to parameterize p functions. In one case, they are 1 and 1. In the other case, they are 3/8 and -1/4. I do not think p is always constant in the documentation. They simply do not always explicitly show dependence.

See what happens with the different types when you put in vectors like 1: 5 and 1: 6.

(also note that even if your observations exactly match the quantiles, some types will still use linear interpolation).

+5
Sep 18 '08 at 18:49
source share



All Articles