Collectively: the sum does not make sense for factors

I am trying to do something that should be simple, any hint of what is happening is very welcome.

I have a large data frame with importing countries from some municipalities. For some countries I have 2 entries. I want to summarize imports from each municipality and have a unique series for each country. I am using the aggregate function. For example (I include a small part of the data frame):

 municipalities<-c("country",1100056, 1100106,1100205,1100304,1200104,1200252) c1<-c("Afghanistan",2,34,23.4,5,0,0) c2<-c("Afghanistan",0,20,11.1,5.4,2,0) c3<-c("Albania",12,120,11.4,5.1,12,10) c4<-c("Albania",0,40,61.1,65.4,652,2) df<-as.data.frame(rbind(municipalities,c1,c2,c3,c4)) 

I mainly try

 df<-df[-1,] aggregate(df[,2:7],list(df[,1]),sum) 

but I get a message:

 Error in Summary.factor(c(4L, 1L), na.rm = FALSE) : sum not meaningful for factors 

I tried to get df be numeric, declare characters as characters, etc., but nothing helps.

+3
r aggregate factors
source share
1 answer

This is because of how you create your data file. For example, c1 is a symbol because a vector can have only one class. When you put them in a data framework, these character vectors are then forced to a coefficient. So you are trying to run sum on factors. You already understood this, but then you tried to convert the coefficients to numeric, which probably gives you meaningless results.

The easy answer is to build your dataframe column by column rather than row by row, so that you don’t have many coercion issues.

Given the data you already have, this will solve your problem:

 df[] <- lapply(df, function(x) type.convert(as.character(x))) aggregate(. ~ V1, df, sum) 

(Thanks to @AnandaMahto for a much cleaner way to do this conversion than what I originally had.)

Result:

  V1 V2 V3 V4 V5 V6 V7 1 Afghanistan 2 54 34.5 10.4 2 0 2 Albania 12 160 72.5 70.5 664 12 
+7
source share

All Articles