Data.table :: merge How to avoid merge coding warnings?

Using merge of data.table , I get encoding warnings. My process is the same person:

  • I am creating the first data table.
  • I am updating this data table with merge .

But when I call merge , I get this warning:

 Please ensure that character columns have identical encodings for joins. 

How can I determine the data table of the encoding used? I know that I can remove the warning using suppressWarnings , but I prefer to fix it, because in my own way.

Playback:

 library(data.table) options(stringsAsFactors=FALSE) dt = data.table(text=c('é','à','s'), title='agstudy',hrefs='a') setkeyv(dt,names(dt)) dt.new = data.table(text=c('é','à','h','a'), hrefs=c(rep('a',2),rep('aa',2)), title=c(rep('agstudy',2),rep('new',2))) setkeyv(dt.new,names(dt.new)) merge(dt.new,dt,all=TRUE) Warning messages: 1: In `[.data.table`(y, xkey, nomatch = ifelse(all.x, NA, 0), allow.cartesian = allow.cartesian) : Encoding of character column 'text' in X is different from column 'text' in Y in join X[Y]. Joins are not implemented yet for non-identical character encodings and therefore likely to contain unexpected results for those entries. Please ensure that character columns have identical encodings for joins. 

EDIT add session information:

 sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) [1] data.table_1.8.11 

EDIT2 add context

My data.table is created after some curettage, where I set the encoding to UTF-8 using htmlParse (..., encoding = 'UTF-8'), then I create the data table using cleared text.

+6
source share
2 answers

The warning comes from a mixture of encodings in your character vectors. The ascii characters are encoded in "unknown", but others are probably "latin1".

Use this to convert all encodings to unknown:

 dt[, names(dt) := lapply(.SD, function(x) {if (is.character(x)) Encoding(x) <- "unknown"; x})] 

If you do the same for the second DT, you will avoid the warning.

Please note that you are using the development version. Behavior may change soon.

+3
source

Encoding problems are fixed in version v.1.9.7 (current development). See ReleaseNotes, bug fixes # 23 . This should work as intended without any warnings or the need to convert encodings. Please advise if this is not the case.

 require(data.table) # v1.9.7+ dt = data.table(text=c('é','à','s'), title='agstudy',hrefs='a') dt.new = data.table(text=c('é','à','h','a'), hrefs=c(rep('a',2),rep('aa',2)), title=c(rep('agstudy',2),rep('new',2))) merge(dt.new, dt, all=TRUE) # text hrefs title # 1: a aa new # 2: h aa new # 3: sa agstudy # 4: à a agstudy # 5: é a agstudy merge(dt.new, dt, all=TRUE, by=c("text", "title")) # text title hrefs.x hrefs.y # 1: a new aa NA # 2: h new aa NA # 3: s agstudy NA a # 4: à agstudy aa # 5: é agstudy aa 
+2
source

All Articles