Warm up
Let me start with the universal fact that we all know that a computer understands nothing but bits - 0 and 1.
Now, when you submit the HTML form via HTTP, and the values move through the wire to reach the target server, then essentially a lot of bits occur - 0 and 1.
- Before sending data to the server, the HTTP client (browser or curl, etc.) will encode it using some encoding scheme and expects the server to decode it using the same scheme so that the server knows exactly what the client sent.
- Before sending a response to the client, the server will encode it using some encoding scheme and expects the client to decode it using the same scheme so that the client knows exactly what the server sent.
Analogue for this, it can be: I am sending you a letter and tell you whether it is written in English or French or Dutch, so that you receive the exact message that I intended to send you. And in answering me, you will also indicate which language I should read.
It is important to remove that the fact that when the data leaves the client, it will be encoded, and the same will be decoded on the server side, and vice versa. If you do not specify anything, the content will be encoded in accordance with application / x-www-form-urlencoded before going from the client side to the server side.
Core concept
Reading a workout is important. There are a few things you need to get the expected results.
- The correct set of encodings before sending data from the client to the server.
- The correct decoding and encoding installed on the server side to read the request and reply to the write back to the client ( which is why you did not get the expected results )
- Make sure that where the same coding scheme is used, it should not happen that on the client you are encoding using ISO-8859-1, and on the server you are decoding using UTF-8, otherwise there will be an error (by my analogy, I write to you in English, and you read in French).
- The correct encoding set for your log viewer if you are trying to verify log usage using the Windows command line or Eclipse log viewer, etc. (this was the reason for your problem, but this was not the main reason, because first of all, your data read from the request object was incorrectly decoded. Windows cmd or Eclipse encoding the log view also matters, read here )
The correct set of encodings before sending data from the client to the server
To verify this, there are several ways to talk, but I will say that use the HTTP-Accept-Charset request header field . According to your provided code snippet, you are already using and using it correctly so that you are good from this point of view.
There are people who say that they do not use it or are not implemented, but I would very humbly disagree with them. Accept-Charset is part of the HTTP 1.1 specification (I provided the link), and a browser that implements HTTP 1.1 will implement the same. They may also claim to use the attribute attribute of the request-header , but
- Actually, it’s not there, check the link for the “Accept header request” field that I provided.
- Mark
I provide you with all the data and facts, not just words, but if you are not satisfied, perform the following tests using different browsers.
- Set
accept-charset="ISO-8859-1" in your HTML form and POST / GET form with Chinese or advanced French characters to the server. - On the server, decode the data using the UTF-8 scheme.
- Now repeat the same tests, exchanging client and server encoding.
You will see that you have never seen the expected characters on the server. But if you use the same coding scheme, you will see the expected character. Thus, browsers implement Accept-Charset and its effect is triggered.
Having the correct decoding and encoding installed on the server side to read the request and write the answer back to the client
There are many ways to talk about what you can do to achieve this (sometimes some configuration may be required based on a specific scenario, but below solves 95% of cases and is well suited to your case). For example:
- Use the character encoding filter to set the encoding on demand and response.
- Use
setCharacterEncoding on request and response - Configure the web server or application server to correctly encode characters with
-Dfile.encoding=utf8 etc. More info here - Etc.
My favorite one will solve your problem - "Character Encoding Filter" due to the following reasons:
- All coding logic of data processing is in one place.
- You have all the power through configuration, change in one place, and all if they are happy.
- You don't have to worry about any other code reading the request stream or flushing the response stream before I can set the character encoding.
1. Character encoding filter
You can do the following to implement your own character encoding filter. If you use some frameworks, such as Springs, etc., you do not need to write your own class, but just configure it in web.xml
The basic logic below is very similar to what Spring does, besides a lot of dependency, a bean-aware thing that they do.
web.xml (configuration)
<filter> <filter-name>EncodingFilter</filter-name> <filter-class> com.sks.hagrawal.EncodingFilter </filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> <init-param> <param-name>forceEncoding</param-name> <param-value>true</param-value> </init-param> </filter> <filter-mapping> <filter-name>EncodingFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>
EncodingFilter (character encoding implementation class)
public class EncodingFilter implements Filter { private String encoding = "UTF-8"; private boolean forceEncoding = false; public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException { request.setCharacterEncoding(encoding); if(forceEncoding){
2. ServletRequest.setCharacterEncoding ()
This is essentially the same code as in the character encoding filter, but instead of doing it in the filter, you do it in your servlet or controller class.
The idea again uses request.setCharacterEncoding("UTF-8"); to set the encoding of the HTTP request stream before reading the http request stream.
Try entering the code, and you will see that if you do not use any filter to set the encoding in the request object, then the first log will be NULL, and the second log will be "UTF-8".
System.out.println("CharacterEncoding = " + request.getCharacterEncoding()); request.setCharacterEncoding("UTF-8"); System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
The following is an important excerpt from setCharacterEncoding Java docs . One more thing to note: you must provide a valid encoding scheme, otherwise you will get an UnsupportedEncodingException
Overrides the name of the character encoding used in the body of this request. This method must be called before requesting to read parameters or enter data using getReader () . Otherwise, it has no effect.
Wherever necessary, I tried to provide you with official links or accepted StackOverflow answers so you can build trust.