How to convert a text file to lowercase on UNIX (but on UTF-8)

I need to convert all the text to lowercase, but not using the traditional tr command, because it does not handle UTF-8 languages ​​properly.

Is there a good way to do this? I need some kind of UNIX filter, so I can process it in a pipe.

+5
source share
2 answers

Gnu sed should be able to handle unicode. Try

$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123
+11
source

If you can use Python, then this code can help you:

import sys
import codecs

utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)

utf8output.write(utf8input.read().lower())

On my windows machine (sorry :) I can use it as a filter:

cat big.txt | python tolowerutf8.py > lower.txt3
+2
source

All Articles