Since URIs were introduced before unicode was around, or at least widely used, I assume this is a very specific implementation issue. UTF-8 encoding your text and then avoiding what normal sounds like the best idea, since it is completely backward compatible with any ASCII / ANSI systems in place, although you can get an odd wierd character or two.
At the other end, to decode, you free the text and get the string UTF-8. If someone using the old system tries to send your data to ASCII / ANSI, no harm will be done that (almost) UTF-8 is already encoded.
Matthew scharley
source share