UTF-8 encoding with dplyr and SQLite

I have a table in SQLite and Id to open it with dplyr. I am using SQLite Expert Version 35.58.2478, R Studio version 0.98.1062 on a PC with Win 7.

After connecting to the database using src_sqlite () and reading using tbl () I get a table. But the character who appeals is mistaken. Reading the same table from the csv file simply works by adding encoding = "utf-8" to the read.csv function, but in this case another error occurs in the name of the first column (please consider the minimal example below).

Please note that in the SQLite table, UTF-8 and SQLite encoding correctly displays the data.

I tried to change the encoding in the parameters of R Studio without success. Also changing a region in windows or in r has no effect.

Is there any solution to correctly enter characters in a table in r using dplyr?

Minimal example

library(dplyr) db <- src_sqlite("C:/Users/Jens/Documents/SQLite/my_db.sqlite") tbl(db, "prozesse") ## Source: sqlite 3.7.17 [C:/Users/Jens/Documents/SQLite/my_db.sqlite] ## From: prozesse [4 x 4] ## ## KH_ID Einschätzung Prozess Gruppe ## 1 1 3 Buchung IT ## 2 2 4 Buchung IT ## 3 3 3 Buchung OLP ## 4 4 5 Buchung OLP 

You see the wrong encoding in the name of the second column. This question also arises in columns with ä, ö, ü, etc.

The name of the second column is displayed correctly, but the first column is incorrect:

 read.csv("C:/Users/Jens/Documents/SQLite/prozess.csv", encoding = "UTF-8") ## XUFEFF.KH_ID Einschätzung Gruppe Prozess ## 1 1 3 PO visite ## 2 2 3 IT visite ## 3 3 3 IT visite ## 4 2 3 PO visite sessionInfo() ## R version 3.1.1 (2014-07-10) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 ## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C ## [5] LC_TIME=German_Germany.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] RSQLite.extfuns_0.0.1 RSQLite_0.11.4 DBI_0.3.0 ## [4] dplyr_0.2 ## ## loaded via a namespace (and not attached): ## [1] assertthat_0.1 digest_0.6.4 evaluate_0.5.5 formatR_1.0 ## [5] htmltools_0.2.6 knitr_1.6 parallel_3.1.1 Rcpp_0.11.2 ## [9] rmarkdown_0.3.3 stringr_0.6.2 tools_3.1.1 yaml_2.1.13 
+4
source share
1 answer

I had the same problem. I solved it as shown below. However, I cannot guarantee that the decision is solid. Try:

 library(dplyr) library(sqldf) # Modifying built-in mtcars dataset mtcars$test <- c("č", "ž", "š", "č", "ž", "š", letters) %>% enc2utf8(.) mtcars$češćžä <- c("č", "ž", "š", "č", "ž", "š", letters) %>% enc2utf8(.) names(mtcars) <- iconv(names(mtcars), "cp1250", "utf-8") # Connecting to sqlite database my_db <- src_sqlite("my_db.sqlite3", create = T) # exporting mtcars dataset to database copy_to(my_db, mtcars, temporary = FALSE) # dbSendQuery(my_db$con, "drop table mtcars") # getting data from sqlite database my_mtcars_from_db <- collect(tbl(my_db, "mtcars")) # disconnecting from database dbDisconnect(my_db$con) 

convert_to_encoding () function

 # a function that encodes # column names and values in character columns # with specified encodings convert_to_encoding <- function(x, from_encoding = "UTF-8", to_encoding = "cp1250"){ # names of columns are encoded in specified encoding my_names <- iconv(names(x), from_encoding, to_encoding) # if any column name is NA, leave the names # otherwise replace them with new names if(any(is.na(my_names))){ names(x) } else { names(x) <- my_names } # get column classes x_char_columns <- sapply(x, class) # identify character columns x_cols <- names(x_char_columns[x_char_columns == "character"]) # convert all string values in character columns to # specified encoding x <- x %>% mutate_each_(funs(iconv(., from_encoding, to_encoding)), x_cols) # return x return(x) } # use convert_to_encoding(my_mtcars_from_db, "UTF-8", "cp1250") 

results

 # before conversion my_mtcars_from_db Source: local data frame [32 x 13] mpg cyl disp hp drat wt qsec vs am gear carb ÄŤešćžä test 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ÄŤ ÄŤ 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Ĺľ Ĺľ 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 š š 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ÄŤ ÄŤ 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Ĺľ Ĺľ 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 š š 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 aa 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 bb 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 cc 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 dd .. ... ... ... ... ... ... ... .. .. ... ... ... ... # after conversion convert_to_encoding(my_mtcars_from_db, "UTF-8", "cp1250") Source: local data frame [32 x 13] mpg cyl disp hp drat wt qsec vs am gear carb test češćžä 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 č č 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ž ž 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 š š 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 č č 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ž ž 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 š š 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 aa 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 bb 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 cc 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 dd .. ... ... ... ... ... ... ... .. .. ... ... ... ... 

Session Information

 devtools::session_info() Session info ------------------------------------------------------------------- setting value version R version 3.2.0 (2015-04-16) system x86_64, mingw32 ui RStudio (0.99.441) language (EN) collate Slovenian_Slovenia.1250 tz Europe/Prague Packages ----------------------------------------------------------------------- package * version date source assertthat * 0.1 2013-12-06 CRAN (R 3.2.0) chron * 2.3-45 2014-02-11 CRAN (R 3.2.0) DBI 0.3.1 2014-09-24 CRAN (R 3.2.0) devtools * 1.7.0 2015-01-17 CRAN (R 3.2.0) dplyr 0.4.1 2015-01-14 CRAN (R 3.2.0) gsubfn 0.6-6 2014-08-27 CRAN (R 3.2.0) lazyeval * 0.1.10 2015-01-02 CRAN (R 3.2.0) magrittr * 1.5 2014-11-22 CRAN (R 3.2.0) proto 0.3-10 2012-12-22 CRAN (R 3.2.0) R6 * 2.0.1 2014-10-29 CRAN (R 3.2.0) Rcpp * 0.11.6 2015-05-01 CRAN (R 3.2.0) RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0) rstudioapi * 0.3.1 2015-04-07 CRAN (R 3.2.0) sqldf 0.4-10 2014-11-07 CRAN (R 3.2.0) 
+5
source

All Articles