Here is a complete shell solution (apparently you are not specifying the language used).
foo='außergewöhnlich' echo "$foo" außergewöhnlich eval "$(printf '%s' "$foo" | sed 's/^/printf "/;s/�*\([0-9]*\);/\$( [ \1 -lt 128 ] \&\& printf "\\\\$( printf \"%.3o\\201\" \1)" || \$(which printf) \\\\u\$( printf \"%.4x\" \1) )/g;s/$/\\n"/')" | sed "s/$(printf '\201')//g" außergewöhnlich
Comment: this work is ALSO with a dash (used as a standard shell for Ubuntu). We need to use GNU printf in some places, because the built-in printf in the dash does not know \ u to convert to Unicode. Furthermore, GNU printf is stupid because it refuses to work with code points from 0 to 127, which are completely legal in UTF. Therefore, we must make conditionnal and use octal for the range 0-128. The latter sed is used if you need to convert characters like Line Feed () or Tab (). We use a trick so that command substitution preserves these trailing characters, then we remove the “trick” with the last sed. The character used for this should NOT happen if your input is Unicode, so it must be safe.
source share