Faster downloads with HTTP byte byte headers

Question

Faster downloads with HTTP byte byte headers

Does anyone have experience using HTTP byte ranges in multiple concurrent requests to speed up loading?

I have an application that should download fairly large images from a web service (1MB +), and then send the modified files (modified and cropped) to the browser. There are many such images, so it is likely that caching will be inefficient, that is, the cache may be empty. In this case, we encounter a rather large waiting time, waiting for the image to load, 500 m / s +, which is more than 60% of the total response time of our application.

I am wondering if I can speed up the loading of these images using a group of parallel HTTP range requests, for example. each stream loads 100 kilobytes of data, and the responses are combined back into a complete file.

Does anyone have experience with this kind of thing? Will additional additional downloads affect the increase in speed or can this technology work? The application is written in ruby, but experience / examples from any language will help.

Several customization features:

There are no bandwidth or connection restrictions for the service (owned by my company).
It is difficult to pre-generate all cropped and resized images, there are millions with many potential permutations.
It is difficult to place the application on the same hardware as the boxes with image discs (political!).

thanks

+4

performance http ruby download

matth Nov 06 '10 at 15:38

source share

3 answers

the tin man · Answer 1 · 2010-11-06T17:23:19+0000

I wrote a backend and services for the place where you take images. Each site is different, so the details based on what I have done may not apply to what you are trying to do.

Here are my thoughts:

If you have a service agreement with the company, you take images (which you need because you have high enough bandwidth), then pre-process their image catalog and save the thumbnails locally, either as block databases or as files on disk with a database containing file paths.
Doesn't this service already have images available as thumbnails? They are not going to send a full-sized image to someone else's browser ... unless they are crazy or sadistic, and their users are crazy and masochistic. We pre-processed our images into three or four different sizes of thumbnails, so it would be trivial to supply what you are trying to do.
If your request is what they expect, then they should have an API, or at least some resources (programmers) that can help you access images in the fastest way. For this purpose, in fact, they must have a dedicated host.

As a photographer, I should also mention that with what you are doing, there may be problems with copyright and / or terms of service, so make sure you are on board in consultation with a lawyer. And accessing your site. Do not think that everything is in order, KNOWS it. Copyright laws do not conform to the generally accepted concept of what copyright is, so bringing a lawyer forward can be truly educational and also give you a good idea that you are on a solid footing. If you have already spoken to one, then you know what I am saying.

Aaron D. Ball · Answer 2 · 2011-01-14T21:00:47+0000

I found your post from Google to find out someone already wrote a parallel wget analogue that does this. This is definitely possible and will be useful for very large files with a relatively high latency link: I got> 10x speed improvement with multiple parallel TCP connections.

However, since your organization works with both the application and the web service, I assume that your connection is high bandwidth and low latency, so I suspect this approach will not help you.

Since you are transferring a large number of small files (by modern standards), I suspect that you are actually burning the connection setting more than the transfer speed. You can verify this by loading a similar page with small images. In your situation, you can go sequentially, and not in parallel: see if your client HTTP library has the ability to use persistent HTTP connections, so that a three-way handshake is performed only once per page or less, and not once per image.

If you end up getting really fanatical about TCP latency, it's also possible to cheat , as some basic web services do.

(My own problem is related to the other end of the TCP performance spectrum, where for a long time, the bandwidth really starts to drag and drop to transfer several TB files, so if you turn on the parallel HTTP library, I would like to hear about it. The only tool that I found, called "puf", parallelized by files, not byteranges. If the above does not help you and you really need a parallel transmission tool, also contact us: I may have refused and wrote it to that time.)

mpapis · Answer 3 · 2010-11-16T00:21:08+0000

I would suggest that using any p2p network would be futile since there are more permutations and then frequently used files.

Downloading parallel multiple parts of a file can only improve on slow networks (slower than 4-10 Mbit / s).

To get any improvement when using parallel loading, you need to provide enough server power. From your current problem (expecting more than 500 ms to connect), I assume that you already have problems with the servers:

you should add / improve load balancing,
you should consider changing the server software for something with better performance.

And again, if 500 ms is 60% of the total response time, then you are overloading the servers, if you think that this is not the case, you should look for the bottleneck in connection / server performance.

Faster downloads with HTTP byte byte headers

More articles: