I am creating a gwt application that stores the text of random web pages in a text box of a data warehouse. Often text is formatted by UTF-8. All files of my application are stored as UTF-8, and when I run the application on my local machine, the whole process works fine. UTF-8 text is saved as such and can be restored to the local version of the application engine as UTF-8. However, when I deploy the application in the Google engine, somewhere between when I store the text and when I extract it, it is no longer UTF-8, which causes non-ascii characters to be displayed as ?.
When I browse the data store in the appengine control panel, all special characters are displayed as? which makes me think that this is a problem when writing to the database.
Does anyone know how to fix this?
The application itself is a little great. Here's some pseudo code:
Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);
/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */
String retrievedText = webPageText.getValue();
The problem is that retrievedText is returning with? instead of unicode characters.
Here is a similar problem in python that I found: Trying to store Utf-8 data in a data store, getting a UnicodeEncodeError . Although my application does not receive any errors.
Unfortunately, I think the default Java strings are utf-8, and I cannot find any code that allows me to declare them explicitly as utf-8.
: webapp, unicode , . , , , -, , . .