Using merge of data.table , I get encoding warnings. My process is the same person:
- I am creating the first data table.
- I am updating this data table with
merge .
But when I call merge , I get this warning:
Please ensure that character columns have identical encodings for joins.
How can I determine the data table of the encoding used? I know that I can remove the warning using suppressWarnings , but I prefer to fix it, because in my own way.
Playback:
library(data.table) options(stringsAsFactors=FALSE) dt = data.table(text=c('é','à','s'), title='agstudy',hrefs='a') setkeyv(dt,names(dt)) dt.new = data.table(text=c('é','à','h','a'), hrefs=c(rep('a',2),rep('aa',2)), title=c(rep('agstudy',2),rep('new',2))) setkeyv(dt.new,names(dt.new)) merge(dt.new,dt,all=TRUE) Warning messages: 1: In `[.data.table`(y, xkey, nomatch = ifelse(all.x, NA, 0), allow.cartesian = allow.cartesian) : Encoding of character column 'text' in X is different from column 'text' in Y in join X[Y]. Joins are not implemented yet for non-identical character encodings and therefore likely to contain unexpected results for those entries. Please ensure that character columns have identical encodings for joins.
EDIT add session information:
sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) [1] data.table_1.8.11
EDIT2 add context
My data.table is created after some curettage, where I set the encoding to UTF-8 using htmlParse (..., encoding = 'UTF-8'), then I create the data table using cleared text.