I am trying to use ^ A as a delimiter between a key and a value in my shortened output files. I found that the configuration setting "mapred.textoutputformat.separator" is what I want, and it correctly switches the delimiter to ",":
conf.set("mapred.textoutputformat.separator", ",");
But it cannot handle ^ A character:
conf.set("mapred.textoutputformat.separator", "\u0001");
causes this error:
ERROR security.UserGroupInformation: PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 94; Character reference "
I found this ticket https://issues.apache.org/jira/browse/HADOOP-7542 and I see that he tried to fix it, but returned the fix due to XML1.1 problems.
SO I wonder if anyone succeeded by setting the delimiter to ^ A (seems pretty common) using lightweight work. Or, if I should just install and use the tab separator.
Thanks!
I am running Hadoop 0.20.2-cdh3u5 on CentOS 6.2
control-characters hadoop separator
alexP_Keaton
source share