How to read an InputStream from a local file other than the network (via Amazon S3)?

Question

How to read an InputStream from a local file other than the network (via Amazon S3)?

I did not think that there is a difference between the input stream object read from the local file compared to the network source (Amazon S3 in this case), so hopefully someone can enlighten me.

These programs ran on a virtual machine running Centos 6.3. The test file in both cases is 10 MB.

Local File Code:

InputStream is = new FileInputStream("/home/anyuser/test.jpg"); int read = 0; int buf_size = 1024 * 1024 * 2; byte[] buf = new byte[buf_size]; ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size); long t3 = System.currentTimeMillis(); int i = 0; while ((read = is.read(buf)) != -1) { baos.write(buf,0,read); System.out.println("reading for the " + i + "th time"); i++; } long t4 = System.currentTimeMillis(); System.out.println("Time to read = " + (t4-t3) + "ms");

The result of this code is as follows: it is read 5 times, which makes sense, since the size of the buffer that is being read is 2 MB, and the file is 10 MB.

 reading for the 0th time reading for the 1th time reading for the 2th time reading for the 3th time reading for the 4th time Time to read = 103ms

Now we have the same code that runs with the same 10 MB test file, with the exception of this time, the source is from Amazon S3. We do not start reading until we finish the stream with S3. However, this time the read cycle is executed thousands of times when it should read only 5 times.

  InputStream is; long t1 = System.currentTimeMillis(); is = getS3().getFileFromBucket(S3Path,input); long t2 = System.currentTimeMillis(); System.out.print("Time to get file " + input + " from S3: "); System.out.println((t2-t1) + "ms"); int read = 0; int buf_size = 1024*1024*2; byte[] buf = new byte[buf_size]; ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size); long t3 = System.currentTimeMillis(); int i = 0; while ((read = is.read(buf)) != -1) { baos.write(buf,0,read); if ((i % 100) == 0) System.out.println("reading for the " + i + "th time"); i++; } long t4 = System.currentTimeMillis(); System.out.println("Time to read = " + (t4-t3) + "ms");

The output is as follows:

 Time to get file test.jpg from S3: 2456ms reading for the 0th time reading for the 100th time reading for the 200th time reading for the 300th time reading for the 400th time reading for the 500th time reading for the 600th time reading for the 700th time reading for the 800th time reading for the 900th time reading for the 1000th time reading for the 1100th time reading for the 1200th time reading for the 1300th time reading for the 1400th time Time to read = 14471ms

The time taken to read a stream varies from start to start. Sometimes it takes 60 seconds, sometimes 15 seconds. It does not work faster than 15 seconds. The reading cycle still goes through 1400+ times in each test run of the program, although I think it should be only 5 times, for example, an example of a local file.

Is this how the input stream works when the source is through the network, although we have finished receiving the file from the network source? Thanks in advance for your help.

+7

java file inputstream amazon-s3 local

Classified Nov 13 '12 at 0:16

source share

2 answers

As @ imel96 points out, there is nothing in the documentation that guarantees the behavior you expect. You will never read 2 MB at a time from a socket, because the socket's receive buffer is usually small, apart from other factors, such as throughput.

+1

Ejp Nov 13 '12 at 3:06

source share

imel96 · Accepted Answer · 2012-11-13T00:36:49+0000

I do not think this is specific to java. When you read from the network, the actual read request to the operating system will return the data packet at a time, no matter what size you allocate. If you check the size of the data read (your read variable), it should indicate the size of the network packet used.

This is one of the reasons people use a separate stream to read from the network and avoid blocking using async i / o technology.

How to read an InputStream from a local file other than the network (via Amazon S3)?

More articles: