I have a difficult situation with which I am trying to use character encoding.
I have a perl program that communicates with the java endpoint through thrift, then java uses the data to query the deprecated php service. This is ugly, but part of the migration plan so requires work for a short time.
Perl creates a thrift object where some of the thrift object fields are json encoded strings.
The problem is that when perl makes a java request, one of the lines looks like this (this is from the data: dumper and subsequently encoded by json and added to the savings):
'offer_message' => "<<>> && \x{c3}\x{82}\x{c2}\x{a9}© <script>alert(\"XSS\");</script> https://url.com/imghp?hl=uk",
However, when this data was received on the java side, the sequence \ x {c3} \ x {82} \ x {c2} \ x {a9} was converted so that in java we get the following:
<<>>\\n&&\\nà  à ©©\\n<script>alert(\"XSS\");</script>\\nhttps://www.google.com.ua/imghp?hl=uk
The problem is that if I pass the second line to the legacy php program, it fails, if I pass the line taken from the perl hash dump, it succeeds. Therefore, I believe that I need to convert the resulting string to a different encoding (correct me if I'm wrong, I'm not sure if this is the right solution).
I tried to take the parameters received in java and convert them to every encoding I can think of, however it does not work. For example:
byte[] utf8 = templateParams.getBytes("UTF8"); normallisedTemplateParams = new String(utf8, "UTF8");
I searched for coding schemes in the hope that I would find something that works.
What is the correct way to solve this problem? For a short time, this dirty decision is my only option when another reorganization occurs.