Although this answer was accepted / accepted many years ago, the currently accepted answer is only valid for single-byte encodings such as iso-8859-1, or for single-byte subsets of multi-byte character sets (like Latin characters in UTF-8) . Even using multibyte splice instead will only work for fixed multibyte encodings such as UTF-16. Given that UTF-8 is now on its way to the universal standard and when viewing this list of languages by the number of native speakers and this list of the top 30 languages using native / secondary use , it is important to specify a simple variable byte character (not byte)) using cut -c and tr / sed with feature classes.
Compare the following, which doubly fails due to two common Latin errors / presumptions regarding the problem with bytes and characters (one of them is head vs. cut , the other is [az][AZ] vs. [:upper:][:lower:] ).
$ printf 'Πού μπορώ να μάθω σανσκριτικά;\n' | \ $ head -c 1 | \ $ sed -e 's/[AZ]/[az]/g' [[unreadable binary mess, or nothing if the terminal filtered it]]
to this (note: this worked fine on FreeBSD, but both cut and tr on GNU / Linux still crippled Greek in UTF-8):
$ printf 'Πού μπορώ να μάθω σανσκριτικά;\n' | \ $ cut -c 1 | \ $ tr '[:upper:]' '[:lower:]' π
Another later answer already suggested "cut", but only because of a side problem that it can be used to indicate arbitrary offsets, and not because of a problem directly related to the character and bytes.
If your cut does not handle -c with variable byte encodings correctly, for the "first X characters" (replace X with your number) you can try:
sed -E -e '1 s/^(.{X}).*$/\1/' -eq - which is limited to the first line, thoughhead -n 1 | grep -E -o '^.{X}' head -n 1 | grep -E -o '^.{X}' - which is limited to the first line and combines the two commands, althoughdd - which was already suggested in other answers, but really cumbersome- A sophisticated
sed script with a sliding window buffer to handle characters distributed across multiple lines, but this is probably more cumbersome / fragile than just using something like dd
If your tr does not handle character classes with variable byte encodings, you can try:
sed -E -e 's/[[:upper:]]/\L&/g (specific to GNU)
Rowan Thorpe Jul 01 '16 at 11:47 2016-07-01 11:47
source share