GLM police stops data in Gelman / Hill book

Someone worked with police in New York, the data mentioned in Gelman, Hill Book Data Analysis Using Reg. and Multi / Hier Modeling (ARM). Data are under

http://www.stat.columbia.edu/~gelman/arm/examples/police/

file frisk_with_noise.dat. I deleted part of the description of this data, renamed past.arrests as arrests, saved it as frisk.dat. Called glm from R as follows:

library ("foreign") frisk <- read.table ("frisk.dat", header=TRUE) attach (frisk) glm(formula = stops ~ 1, family=poisson, offset=log(arrests)) 

The bell rings directly from the ARM book. Anyway, I get an error message:

 Error: NA/NaN/Inf in foreign function call (arg 4) 

Any ideas? Gelman has a piece of code under the same directory called polic_setup.R, which should have some cleanup code, but that doesn't work either.

+4
source share
3 answers

I did not return to see what exactly Gelman does in this chapter (my copy of the book is in the repository ...), but the specific problem with this example is that the β€œarrests” are zero in some so use the journal (arrests ) as an offset causes problems. (You don't need a library (external), and using the data argument for glm is usually safer / better than using attach ().)

 X <- read.table("frisk_with_noise.dat",skip=6,header=TRUE) names(X)[3] <- "arrests" glm(stops~1,family=poisson,offset=log(arrests),data=X, subset=arrests>0) 
+4
source

The above code works, but the analysis results are different from the book. According to this blogpost, the authors had to manually change the data due to privacy issues.

+2
source

For example, Gelman 6, I believe that you first need to aggregate the crime. The frisk_with_noise.dat file contains 900 observations, one record for each ethnic group, a plot for each crime (75 * 3 * 4 = 900). But the example in chapter 6 shows n = 225 (75 * 3). So Ben's code extension with something like this brings you a little closer to replicating output:

 library(arm) # for display() function X <- read.table("frisk_with_noise.dat",skip=6,header=TRUE) names(X)[3] <- "arrests" X <- aggregate(cbind(stops, arrests) ~ precinct + eth, data=X, sum) fit.1 <- glm(stops~1,family=poisson,offset=log(arrests),data=X, subset=arrests>0) display(fit.1) 

But, as the note at the top of the frisk_with_noise.dat file notes, noise is added, so the results cannot be accurately reproduced.

+1
source

All Articles