International characters in a file name in mutipart formdata

I am using Apache HTTP Components (4.1-alpha2) to upload files to Dropbox. This is done using data from a multi-page form. What is the correct way to encode file names in a multipart form containing international characters (non-ascii)?

If I use the standard API there, the server returns the HTTP Forbidden status. If I change the download code so that the file name is urlencoded:

MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE); FileBody bin = new FileBody(file_obj, URLEncoder.encode(file_obj.getName(), HTTP.UTF_8), HTTP.UTF_8, HTTP.OCTET_STREAM_TYPE ); entity.addPart("file", bin); req.setEntity(entity); 

The file is uploaded, but in the end I get the name of the file, which is still encoded. For instance. % D1% 82% D0% B5% D1% 81% D1% 82.txt

+4
source share
3 answers

To solve this problem specifically for the Dropbox server, I had to encode the file name in utf8. To do this, I had to declare my multi-part entity as follows:

MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE, null, Charset.forName(HTTP.UTF_8));

I received a ban due to a signed OAuth object that did not match the actual sent object (it was encoded in the URL).

For those who are interested in what standards have to say about this, I did some RFC reading. If the standard is strictly observed, then all headers should be encoded in 7 bits, this will make the utf8 encoding of the file name illegal. However, RFC2388 () states:

The source name of the local file may also be the "filename" parameter, the "content-disposition: form-data" header, or, in the case of several files, the "content-disposition: file" header of the subpart. submitting the application MAY provide the file name; if the sender's file name is not in US-ASCII, the file name can be approximated, or encoded using RFC 2231.

Many reports mention using rfc2231 or rfc2047 for non-US-ASCII 7-bit encoding headers. However, rfc2047 explicitly indicates in section 5.3 encoded words SHOULD NOT be used in the Content-Disposition field. This will leave only rfc2231, however, this extension cannot be used for implementation on all servers. In fact, most major browsers send non-US-ASCII characters to UTF-8 (hence the HttpMultipartMode.BROWSER_COMPATIBLE mode in the Apache HTTP client), and because of this, most web servers will support this. Another thing is that if you use HttpMultipartMode.STRICT for a multipart object, the library will actually replace non-ASCII for the question mark (?) In the .S file name

+4
source

I would think that the FileBody implementation FileBody take responsibility for applying the relevant rules from RFC 2047 . Then the file name will be encoded as =?UTF-8?Q?=D1=82=D0=B5=D1=81=D1=82.txt?= Or something very similar.

+2
source

Quick fix:

 new String(multipartFile.getOriginalFilename().getBytes ("iso-8859-1"), "UTF-8"); 
0
source

All Articles