I have a strange fix here. I use data.table for a very routine task, but there is something that I cannot explain. I figured out the way around the problem, but I think itβs still important for me to understand what is wrong here.
This code will bring the data to the workspace:
library(XML) library(data.table) theurl <- "http://goo.gl/hOKW3a" tables <- readHTMLTable(theurl) new.Res <- data.table(tables[[2]][4:5][-(1:2),]) suppressWarnings(names(new.Res) <- c("Party","Cases"))
There are two columns, Party and Cases . Both of them have a default class of factor . Although Cases should be numeric . Ultimately, I just want to get the amount of Cases for each Party . So something like this should work:
new.Res[,sum(Cases), by=Party]
But this does not give the correct answer. I thought this would work if I change the Cases class from factor to numeric . So I tried the following:
new.Res[,Cases := as.numeric(Cases)] new.Res[,sum(Cases), by=Party]
But I have the same wrong answer. I realized that the problem is changing the Cases class from factor to numeric . So I tried another method and it worked:
Step1: re-initialize the data:
theurl <- "http://goo.gl/hOKW3a" tables <- readHTMLTable(theurl) new.Res <- data.table(tables[[2]][4:5][-(1:2),]) suppressWarnings(names(new.Res) <- c("Party","Cases"))
Step 2. Use another method to change the class from factor to numeric :
new.Res[,Cases := strtoi(Cases)] new.Res[,sum(Cases), by=Party]
It's fine! However, I'm not sure what happened to the first two methods. What am I missing?