I think setting the encoding (copy) of str to "unknown" before using cat() less magical and works just as well. I think this should avoid any unwanted character set conversions in cat() .
Here is an extended example demonstrating what I think in the original example:
print_info <- function(x) { print(x) print(Encoding(x)) str(x) print(charToRaw(x)) } cat("(1) Original string (UTF-8)\n") str <- "\xe1\xbb\x8f" Encoding(str) <- "UTF-8" print_info(str) cat(str, file="no-iconv") cat("\n(2) Conversion to UTF-8, wrong input encoding (latin1)\n") ## from = "" is conversion from current locale, forcing "latin1" here str2 <- iconv(str, from="latin1", to="UTF-8") print_info(str2) cat(str2, file="yes-iconv") cat("\n(3) Converting (2) explicitly to latin1\n") str3 <- iconv(str2, from="UTF-8", to="latin1") print_info(str3) cat(str3, file="latin") cat("\n(4) Setting encoding of (1) to \"unknown\"\n") str4 <- str Encoding(str4) <- "unknown" print_info(str4) cat(str4, file="unknown")
In a "Latin-1" locale (see ?l10n_info ), as used by R on Windows, the output files "yes-iconv" , "latin" and "unknown" must be correct (byte sequence 0xe1 , 0xbb , 0x8f , which is equal to "ỏ" ).
In the "UTF-8" locale "UTF-8" "no-iconv" and "unknown" files must be correct.
The result of the sample code is as follows, using R 3.3.2, the 64-bit version of Windows running in Wine:
(1) Original string (UTF-8) [1] "ỏ" [1] "UTF-8" chr "<U+1ECF>""| __truncated__ [1] e1 bb 8f (2) Conversion to UTF-8, wrong input encoding (latin1) [1] "á»\u008f" [1] "UTF-8" chr "á»\u008f" [1] c3 a1 c2 bb c2 8f (3) Converting (2) explicitly to latin1 [1] "á»" [1] "latin1" chr "á»" [1] e1 bb 8f (4) Setting encoding of (1) to "unknown" [1] "á»" [1] "unknown" chr "á»" [1] e1 bb 8f
The original iconv() example uses the argument from = "" by default, which means conversion from the current locale, which is effectively "latin1". Since the str encoding is actually “UTF-8,” the string byte representation is distorted in step (2), but then cat() implicitly restored when it (presumably) converts the string back to the current locale, as demonstrated by the equivalent conversion in step ( 3).