Solrj - encoding problem

Question

Solrj - encoding problem

I have a document with a field as title that matters - Mörder (with umlaut o).

When I retrieve it in java using the following method, the value returned in both print commands is Morder (from umlaut to r). It’s strange.

When I go to the web interface provided by Solr, the name is Mörder (with umlaut o).

Can someone tell me what is going wrong?

SolrQuery query = new SolrQuery(); query.setParam("q", "<some query>"); query.setStart(start); query.setRows(rows); query.setFacet(false); query.setFields("title"); QueryResponse rsp = server.query(query); SolrDocumentList sdl = rsp.getResults(); for (SolrDocument sdOl : sdl) { System.out.println(sdOl.getFieldValue("title")); System.out.println(new String(sdOl.getFieldValue("title").toString().getBytes, "UTF-8")); }

EDIT

I actually compare the names of documents with 2 cores. One returns the correct umlauts, but the other always moves the umlauts to the next character.

+4

java lucene solr solrj

Jhs Feb 11 '13 at 17:33

source share

1 answer

Alexandre Rafalovitch · Answer 1 · 2013-02-11T18:20:39+0000

Is Unicode partition confused with converting large / small Indian bytes? Just a wild (half-noisy) hunch.

Actually, there is no answer, but I would put Wireshark and see what the client asks and what the server answers. This will tell you if there is a problem when leaving the server or when arriving at the client.

I do not know your client configuration, but if the traffic goes through a binary file, there are some client options that will switch it to XML. If this in itself makes the problem go away, then the problem is with the javabin format. If this is not the case, at least you have the exact request and response to the job.

Solrj - encoding problem

More articles: