Why is the “curve” so different from the “lines” and “dots” in R?

Question

Why is the “curve” so different from the “lines” and “dots” in R?

I would like to fit frequency data with discrete generalized beta distribution ( DGBD ).

The data is as follows:

freq = c(1116, 2067, 137 , 124, 643, 2042, 55 ,47186, 7504, 1488, 211, 1608, 3517 , 7 , 896 , 378, 17 ,3098, 164977 , 601 , 196, 637, 149 , 44,2 , 1801, 882 , 636,5184, 1851, 776 , 343 , 851, 33 ,4011, 209, 715 , 937 , 20, 6922, 2028 , 23, 3045 , 16 , 334, 31 , 2) Rank = rank(-freq, ties.method = c("first") ) p = freq/sum(freq)

get journal forms

 log.f = log(freq) log.p = log(p) log.rank = log(Rank) log.inverse.rank = log(length(Rank)+1-Rank)

linear regression of the discrete generalized beta distribution

 co=coef(lm(log.p~log.inverse.rank + log.rank)) zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))

plot

 plot(p~Rank, xlim = c(1, 80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T) xx=c(1:length(Rank)) lines(zmf(xx)~xx, col = "red") points(zmf(xx)~xx, col = "purple")

enter image description here

Figure 1. The plot looks like this:

My question is, is this the right way to demonstrate the result? lines (points) or curves?

Update:

Although I have not yet understood the logic of underling, a solution has been found:

@Frank reminds me of a trick of setting length n on a curve. This solves the problem. Thus, n in the curve is necessary when we try to pick up the raw data. Although in many situations, n is ignored.

 plot(p~Rank, log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T, n = length(Rank)) # set the the number of x values at which to evaluate.

Figure 2 The correct way to use the curve: specify 'n'

+6

r plot lines curve points

Frank wang Mar 17 '14 at 2:36

source share

1 answer

plannapus · Answer 1 · 2014-03-28T09:22:58+0000

The reason you need to specify n here is because your function depends on length(x) !

 zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x)) ^^^^^^^^^

Here, the length x provided by your curve function is n !

Here is your plot if you stick to the standard n=101 , but feed your line and points an xx vector of length 101:

 plot(p~Rank, xlim = c(1,80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T) xx=seq(1,length(Rank),length.out=101) lines(zmf(xx)~xx, col = "red") points(zmf(xx)~xx, col = "purple")

Neither voodoo nor error! :)

Why is the “curve” so different from the “lines” and “dots” in R?

get journal forms

linear regression of the discrete generalized beta distribution

plot

Figure 1. The plot looks like this:

My question is, is this the right way to demonstrate the result? lines (points) or curves?

Update:

Figure 2 The correct way to use the curve: specify 'n'

More articles: