Why is the “curve” so different from the “lines” and “dots” in R?

I would like to fit frequency data with discrete generalized beta distribution ( DGBD ).

The data is as follows:

freq = c(1116, 2067, 137 , 124, 643, 2042, 55 ,47186, 7504, 1488, 211, 1608, 3517 , 7 , 896 , 378, 17 ,3098, 164977 , 601 , 196, 637, 149 , 44,2 , 1801, 882 , 636,5184, 1851, 776 , 343 , 851, 33 ,4011, 209, 715 , 937 , 20, 6922, 2028 , 23, 3045 , 16 , 334, 31 , 2) Rank = rank(-freq, ties.method = c("first") ) p = freq/sum(freq) 

get journal forms

 log.f = log(freq) log.p = log(p) log.rank = log(Rank) log.inverse.rank = log(length(Rank)+1-Rank) 

linear regression of the discrete generalized beta distribution

 co=coef(lm(log.p~log.inverse.rank + log.rank)) zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x)) 

plot

 plot(p~Rank, xlim = c(1, 80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T) xx=c(1:length(Rank)) lines(zmf(xx)~xx, col = "red") points(zmf(xx)~xx, col = "purple") 

enter image description here

Figure 1. The plot looks like this:

My question is, is this the right way to demonstrate the result? lines (points) or curves?

Update:

Although I have not yet understood the logic of underling, a solution has been found:

@Frank reminds me of a trick of setting length n on a curve. This solves the problem. Thus, n in the curve is necessary when we try to pick up the raw data. Although in many situations, n is ignored.

 plot(p~Rank, log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T, n = length(Rank)) # set the the number of x values at which to evaluate. 

enter image description here

Figure 2 The correct way to use the curve: specify 'n'

+6
source share
1 answer

The reason you need to specify n here is because your function depends on length(x) !

 zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x)) ^^^^^^^^^ 

Here, the length x provided by your curve function is n !

Here is your plot if you stick to the standard n=101 , but feed your line and points an xx vector of length 101:

 plot(p~Rank, xlim = c(1,80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)") curve(zmf, col="blue", add = T) xx=seq(1,length(Rank),length.out=101) lines(zmf(xx)~xx, col = "red") points(zmf(xx)~xx, col = "purple") 

enter image description here

Neither voodoo nor error! :)

+3
source

All Articles