LESSCHARSET = utf-8 less doesn't seem to work

I am trying to view a UTF-8 text file / stream in less , and even if I call it like this:

 cat file | LESSCHARSET=utf-8 less 

UTF-8 characters that are not ASCII compatible are not displayed correctly. Instead, their hexadecimal values ​​are displayed in parentheses, for example. <F4> .

Reading the same text in vim encoded with UTF-8 is not a problem. Therefore, I think that something is wrong with the way I call less .

My locale conclusion is as follows

 LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= 

My smaller version is the version installed by Xcode on OSX Leopard:

 $ less --version | sed 's/^/ /' less 394 Copyright (C) 1984-2005 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Homepage: http://www.greenwoodsoftware.com/less 

locale -a | grep US | sed 's/^/ /' locale -a | grep US | sed 's/^/ /' displays the following:

 en_AU.US-ASCII en_CA.US-ASCII en_GB.US-ASCII en_NZ.US-ASCII en_US en_US.ISO8859-1 en_US.ISO8859-15 en_US.US-ASCII en_US.UTF-8 
+6
unix utf-8
source share
5 answers
  • What does the locale command output? Is it a UTF-8 locale?

  • Are you sure your terminal is configured to display UTF-8? Does echo -e '\xe2\x82\xac' a € (euro) sign?

  • Is the language standard installed in the system? Is it present on the list that locale -a displays?

  • Which version of less are you using? (Run less --version to find out.) Indeed, really old versions did not even support LESSCHARSET . This one is less likely because I have a Debian "sarge" system with less version 382 and it doesn’t even need LESSCHARSET if the locale is installed correctly.

+8
source share

I assume that your file is not UTF8, but ISO8859. (It is assumed that the character <F4> should be "Γ΄"?)

Launch xterm using LANG=en_US.ISO-8859-1 xterm . Then check the locale (the locale output should be something like en_US.ISO-8859-1). Then use less to view the file. Is it displayed correctly?

Please note that just using LESSCHARSET=iso8859 without starting a new terminal is not enough. LESSCHARSET says less that the terminal can interpret iso8859, but your terminal probably displays UTF8, as the euro sign is displayed correctly. But since \ xf4 is not a valid utf8 character, the terminal will probably display something like "".

+5
source share

Try the file file.txt . If, for example, the output is "ISO-8859 English text", then change the encoding of the file from ISO-8859 to UTF-8 using the command iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt . If less testfile.txt displayed correctly, end with mv testfile.txt file.txt .

+2
source share

On Mac OS, the encoding must be uppercase:

 bash-4.4$ less --version less 458 (POSIX regular expressions) Copyright (C) 1984-2012 Mark Nudelman bash-4.4$ LESSCHARSET=cp1251 less invalid charset name bash-4.4$ LESSCHARSET=CP1251 less Missing filename ("less --help" for help) 

Here I found a list of encodings:

 { "ascii", NULL, "8bcccbcc18b95.b" }, { "utf-8", &utf_mode, "8bcccbcc18b95.b126.bb" }, { "iso8859", NULL, "8bcccbcc18b95.33b." }, { "latin3", NULL, "8bcccbcc18b95.33b5.b8.b15.b4.b12.b18.b12.b." }, { "arabic", NULL, "8bcccbcc18b95.33b.3b.7b2.13b.3b.b26.5b19.b" }, { "greek", NULL, "8bcccbcc18b95.33b4.2b4.b3.b35.b44.b" }, { "greek2005", NULL, "8bcccbcc18b95.33b14.b35.b44.b" }, { "hebrew", NULL, "8bcccbcc18b95.33b.b29.32b28.2b2.b" }, { "koi8-r", NULL, "8bcccbcc18b95.b." }, { "KOI8-T", NULL, "8bcccbcc18b95.b8.b6.b8.bb5b7.3b4.b4.b3.bb3b." }, { "georgianps", NULL, "8bcccbcc18b95.3b11.4b12.2b." }, { "tcvn", NULL, "b..b...bcccbccbbb7.8b95.b48.5b." }, { "TIS-620", NULL, "8bcccbcc18b95.b.4b.11b7.8b." }, { "next", NULL, "8bcccbcc18b95.bb125.bb" }, { "dos", NULL, "8bcccbcc12bc5b95.b." }, { "windows-1251", NULL, "8bcccbcc12bc5b95.b24.b." }, { "windows-1252", NULL, "8bcccbcc12bc5b95.b.b11.b.2b12.b." }, { "windows-1255", NULL, "8bcccbcc12bc5b95.b.b8.b.5b9.b.4b." }, { "ebcdic", NULL, "5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b9.8b8.17b3.3b9.7b9.8b8.6b10.bbb" }, { "IBM-1047", NULL, "4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc191.b" }, { NULL, NULL, NULL } 

and their aliases:

 { "UTF-8", "utf-8" }, { "ANSI_X3.4-1968", "ascii" }, { "US-ASCII", "ascii" }, { "latin1", "iso8859" }, { "ISO-8859-1", "iso8859" }, { "latin9", "iso8859" }, { "ISO-8859-15", "iso8859" }, { "latin2", "iso8859" }, { "ISO-8859-2", "iso8859" }, { "ISO-8859-3", "latin3" }, { "latin4", "iso8859" }, { "ISO-8859-4", "iso8859" }, { "cyrillic", "iso8859" }, { "ISO-8859-5", "iso8859" }, { "ISO-8859-6", "arabic" }, { "ISO-8859-7", "greek" }, { "IBM9005", "greek2005" }, { "ISO-8859-8", "hebrew" }, { "latin5", "iso8859" }, { "ISO-8859-9", "iso8859" }, { "latin6", "iso8859" }, { "ISO-8859-10", "iso8859" }, { "latin7", "iso8859" }, { "ISO-8859-13", "iso8859" }, { "latin8", "iso8859" }, { "ISO-8859-14", "iso8859" }, { "latin10", "iso8859" }, { "ISO-8859-16", "iso8859" }, { "IBM437", "dos" }, { "EBCDIC-US", "ebcdic" }, { "IBM1047", "IBM-1047" }, { "KOI8-R", "koi8-r" }, { "KOI8-U", "koi8-r" }, { "GEORGIAN-PS", "georgianps" }, { "TCVN5712-1", "tcvn" }, { "NEXTSTEP", "next" }, { "windows", "windows-1252" }, /* backward compatibility */ { "CP1251", "windows-1251" }, { "CP1252", "windows-1252" }, { "CP1255", "windows-1255" }, { NULL, NULL } 
+1
source share

Worked for me:

 f='path/to/file/filename.extension'; LESSCHARSET='file -b --mime-encoding ${f}|tr '[:lower:]' '[:upper:]'' less ${f} 
0
source share

All Articles