John Tukey "median median" (or "steady line") statistical test for R and linear regression

I am looking for John Tukey's algorithm that computes a “stable line” or “median median line” on my linear regression with R.

A student from the list of layouts explains this algorithm in the following expressions:

“The way he calculated was to divide the data into three groups, find the x-median and y-median values ​​(called the summary point) for each group and then use these three summary points to define the line. The outer two summary points determine the slope, and on average all of them determine the interception. "

Article about John Tukey median median for the curious: http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/

Do you have an idea where I can find this algorithm or R function? In which packages, thank you very much!

+7
function algorithm r testing linear-regression
source share
3 answers

Here is how to calculate the median median line here . R implementation is

median_median_line <- function(x, y, data) { if(!missing(data)) { x <- eval(substitute(x), data) y <- eval(substitute(y), data) } stopifnot(length(x) == length(y)) #Step 1 one_third_length <- floor(length(x) / 3) groups <- rep(1:3, times = switch((length(x) %% 3) + 1, one_third_length, c(one_third_length, one_third_length + 1, one_third_length), c(one_third_length + 1, one_third_length, one_third_length + 1) )) #Step 2 x <- sort(x) y <- sort(y) #Step 3 median_x <- tapply(x, groups, median) median_y <- tapply(y, groups, median) #Step 4 slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1]) intercept <- median_y[1] - slope * median_x[1] #Step 5 middle_prediction <- intercept + slope * median_x[2] intercept <- intercept + (median_y[2] - middle_prediction) / 3 c(intercept = unname(intercept), slope = unname(slope)) } 

To test this, here is the second example from this page:

 dfr <- data.frame( time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89), distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1)) median_median_line(time, distance, dfr) #intercept slope # -113.6 520.0 

Pay attention to a somewhat strange way of defining groups. The instructions are quite complicated in how you determine the size of the groups, so the more obvious cut(x, quantile(x, seq.int(0, 1, 1/3))) method cut(x, quantile(x, seq.int(0, 1, 1/3))) does not work.

+11
source share

I'm a little late to the party, but have you tried line () from the statistics package?

From the help file:

Value

Object of class "tukeyline".

References

Tukey, JW (1977). Intelligence Analysis, Reading Massachusetts: Addison-Wesley.

+2
source share

As a member of the R Core team, I have now dug up the source code and also studied its history.

Conclusion: The original C source code added in 1996997, when R was still called alpha (and around version 0.14alpha), already calculated the quantiles not quite right ... for some sample sizes.

More on this on the R mailing lists (not yet).

+2
source share

All Articles