UTF-8 encoding issue in Google App Engine

I am creating a gwt application that stores the text of random web pages in a text box of a data warehouse. Often text is formatted by UTF-8. All files of my application are stored as UTF-8, and when I run the application on my local machine, the whole process works fine. UTF-8 text is saved as such and can be restored to the local version of the application engine as UTF-8. However, when I deploy the application in the Google engine, somewhere between when I store the text and when I extract it, it is no longer UTF-8, which causes non-ascii characters to be displayed as ?.

When I browse the data store in the appengine control panel, all special characters are displayed as? which makes me think that this is a problem when writing to the database.

Does anyone know how to fix this?

The application itself is a little great. Here's some pseudo code:

Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);

/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */

String retrievedText = webPageText.getValue();

The problem is that retrievedText is returning with? instead of unicode characters.

Here is a similar problem in python that I found: Trying to store Utf-8 data in a data store, getting a UnicodeEncodeError . Although my application does not receive any errors.

Unfortunately, I think the default Java strings are utf-8, and I cannot find any code that allows me to declare them explicitly as utf-8.

: webapp, unicode , . , , , -, , . .

+5
4

​​ , utf-8. , , "????..."

: HTTP- Apache, :

:

NameValuePair... params;
...
String url = urlBase + URLEncodedUtils.format(Arrays.asList(params), "UTF-8");
HttpGet httpGet = new HttpGet(url);

:

NameValuePair... params;
...
HttpPost httpPost = new HttpPost(url);
httpPost.setEntity(new UrlEncodedFormEntity(Arrays.asList(params), "UTF-8"));

: HttpServlet, :

HttpServletResponse resp;
...
resp.setContentType("text/html; charset=utf-8");
+3

String ByteArray, datastore blob.

//Save String as Blob
Blob webPageText = new Blob(<STRING THAT CONTAINS UNICODE CHARACTERS>.getBytes());

//Retrieve Blob as String
String retrievedText = new String(webPageText.getBytes());

, , . - ? , , , .

+1

: "8859_1" charset
= > , .

new String(req.getParameter("title").getBytes("8859_1"),"utf-8")

When I ran this application on my local machine, everything was in order. But when I turned around, I ran into the same problem as you. I solved this problem:

After
=> Save the data warehouse code.

new String(req.getParameter("title").getBytes("utf-8"),"utf-8")
+1
source

All Articles