Why does optimx in R not provide the correct solution to this simple nonparametric likelihood maximization?

Is optimx () ​​providing the wrong solution or am I missing a simple point? Thanks!

I am trying to maximize very simple verisimilitude. This is a nonparametric plausibility in the sense that the distribution F is not parametrically specified. Rather, for each observable, xi , f(xi)=pi and therefore log(Likelihood)=Sum(log(f(xi)))=Sum(log(pi)) .

I am trying to maximize the function: sum(log(pi))+lamda(sum(pi-1)) where sum(pi)=1 (i.e. this is a limited maximization problem that can be solved using the Lagrange multiplier).

The answer to this problem is: pi=1/n where n is the number of data points. However, optimx does not seem to provide this solution. Somebody knows. If n=2 , then maximizing the function i log(p1)+log(p2)+lamda(p1+p2-1) .

Here is my code and output from R:

 n=2 log.like=function(p) { lamda=p[n+1] ll=0 for(i in 1:n){ temp = log(p[i])+lamda*p[i]-lamda/(n) ll=ll+temp } return(-ll) } mle = optimx(c(0.48,.52,-1.5), log.like, lower=c(rep(0.1,2),-3), upper=c(rep(.9,2),-1), method = "L-BFGS-B") > mle par fvalues method fns grs itns conv KKT1 KKT2 xtimes 1 0.9, 0.9, -1.0 1.010721 L-BFGS-B 8 8 NULL 0 FALSE NA 0 

The solution of the equation for n=2 is p1=p2=1/2 and lamda=-2 . However, I do not get this when using optimx. Any idea?

+8
optimization r
source share
1 answer

Nothing wrong with optimx . Take a step back and look at the function you want to maximize: log(p1) + log(p2) + lamda*(p1+p2-1) . It is completely intuitively clear that the optimal solution is to make all the variables as possible as possible, no? See that optimx rightfully returned the upper bounds you specified.

So what is wrong with your approach? When using Lagrange multipliers, the critical points are the saddle points of your function above, and not local minima like optimx will help you find. Therefore, you need to change your problem so that these saddle points become local minima. This can be done by optimizing the gradient rate, which is easy to calculate analytically for your problem. Here is a great example (with pictures):

http://en.wikipedia.org/wiki/Lagrange_multiplier#Example:_numerical_optimization .

For your problem:

 grad.norm <- function(x) { lambda <- tail(x, 1) p <- head(x, -1) h2 <- sum((1/p + lambda)^2) + (sum(p) - 1)^2 } optimx(c(.48, .52, -1.5), grad.norm, lower = c(rep(.1, 2), -3), upper = c(rep(.9, 2), -1), method = "L-BFGS-B") # par fvalues method fns grs [...] # 1 0.5000161, 0.5000161, -1.9999356 1.038786e-09 L-BFGS-B 13 13 [...] 

Follow after . If you do not want or cannot independently calculate the gradient, you can let R calculate the numerical approximation, for example:

 log.like <- function(x) { lambda <- tail(x, 1) p <- head(x, -1) return(sum(log(p)) + lambda*(sum(p) - 1)) } grad.norm <- function(x) { require(numDeriv) return(sum(grad(log.like, x)^2)) } optimx(c(.48, .52, -1.5), grad.norm, lower = c(rep(.1, 2), -3), upper = c(rep(.9, 2), -1), method = "L-BFGS-B") # par fvalues method fns grs [...] # 1 0.5000161, 0.5000161, -1.9999356 1.038784e-09 L-BFGS-B 13 13 [...] 
+21
source share

All Articles