Convert text to 7-bit ASCII from the command line

I'm on OS X 10.5.5 (although it doesn't really matter, I think)

I have a set of text files with unusual characters, such as double quotes, ellipses ("...") in one character, etc.

I need to convert these files to the good old 7-bit ASCII, preferably without losing the character value (i.e., convert these ellipses into three periods, backquotes to regular ones, etc.).

To do this, consult the command line utility (bash) / script.

+4
source share
6 answers

The Elinks web browser converts Unicode objects to their ASCII equivalents, providing things like "-" for "-" and "..." for "...", etc. There is a python module python -elinks that uses the same conversion table, and it will be trivial to turn it into a shell filter, for example:

#!/usr/bin/env python import elinks import sys for line in sys.stdin: line = line.decode('utf-8') sys.stdout.write(line.encode('ASCII', 'elinks')) 
+2
source

iconv should do this as far as i know. Not 100% sure how it handles transformations, where one input character should / could become several output characters, for example, with an ellipsis example ... Try something!

Update: I tried and it does not seem to work. It fails, perhaps because it does not know how to express the ellipsis (the test character I used) in the "lesser" encoding. The conversion from UTF-8 to UTF-16 went well.: / However, iconv might be worth exploring further.

+1
source

See transliteration tools; I like Unidecode (in Perl), and it's not too difficult to port to other languages.

+1
source

I used iconv to convert the file from UTF-16LE (little known, as I found out by trial and error), which was created by TextPad on Windows in ASCII on OSX as follows:

  cat utf16file.txt |iconv -f UTF-16LE -t ASCII > asciifile.txt 

You can broadcast via hexdump, as well as view the characters and make sure that you get the correct output, the terminal knows how to interpret UTF-16 and displays it correctly, so you can’t just say, but do β€œcat”, from the file:

 cat utf16file.txt | iconv -f UTF-16LE -t ASCII | hexdump -C 

This shows a layout with hexadecimal char codes and ASCII characters on the right side, and you can try different encodings in the -f "from" parameter to find out what you are dealing with.

Use the iconv -l icon to display the character sets that iconv can use on your system.

+1
source

There was a question yesterday or on the eve of renaming files, and I showed a Perl script rename.pl that could be used for the task. The problem area is knowledge of the coding of odd characters and the development of the correct sequence of transliterations. I would probably do this with an adaptation of this script that performed all the mappings sequentially. Doing this one character at a time would be inconvenient.

Question: How to rename with a prefix / suffix

0
source

python3 version:

 #!/usr/bin/env python3 import sys import elinks for line in sys.stdin: sys.stdout.write(line.encode('ASCII', 'elinks').decode('utf-8')) 

It is worth noting that python-elinks is pure python; no real installation required.

0
source

All Articles