I think the most likely cause of this error is negative values ββor zeros in the data, since the default link in glm.nb is a "log". It would be easy enough to test by changing link="identity" . I also think you need to try smaller models .... maybe a quarter of these variables start. It also allows you to add related variables in the form of bunches, since they look from names that you might have serious potential for collinearity with categorical variables.
We really need a description of the data. I wondered about Dirty.Industry + Clean.Industry . This is a kind of dichotomy that is better handled by a factor variable that has these levels. This prevents collinearity if Clean = not-Dirty. Perhaps similar to your heterogeneity variables. (I'm not sure @BenBolker's comment is correct. I think it is very likely that you will first need statistical consultation before address encoding problems.)
require(MASS) data(quine) # following example in ?glm.nb page > quine$Days[1] <- -2 > quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity") Error in eval(expr, envir, enclos) : negative values not allowed for the 'Poisson' family > quine$Days[1] <- 0 > quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity") Error: no valid set of coefficients has been found: please supply starting values In addition: Warning message: In log(y/mu) : NaNs produced
source share