The main setup for decorators is as follows:
InputStream fileStream = new FileInputStream(filename); InputStream gzipStream = new GZIPInputStream(fileStream); Reader decoder = new InputStreamReader(gzipStream, encoding); BufferedReader buffered = new BufferedReader(decoder);
The key issue in this snippet is the value of encoding . This is the character encoding of the text in the file. These are US-ASCII, UTF-8, SHIFT-JIS, ISO-8859-9, & hellip ;? There are hundreds of possibilities, and the right choice usually cannot be determined from the file itself. It must be defined through some out-of-band channel.
For example, perhaps this is the default platform. However, in a networked environment this is very fragile. The machine that wrote the file may be in a neighboring cell, but has a different encoding by default.
Most network protocols use a header or other metadata to explicitly mark character encodings.
In this case, the file extension shows that the content is XML. For this purpose, XML contains the "encoding" attribute in the XML declaration. In addition, XML must really be processed using an XML parser, not text. Reading XML line by line is like a fragile special case.
It is not possible to explicitly specify the encoding for the second command. Use the default encoding at your peril!
erickson Jul 03 '09 at 18:24 2009-07-03 18:24
source share