Using TextIO.Read is converted with a large collection of compressed text files (1000+ files ranging in size from 100 MB to 1.5 GB), we sometimes get the following error:
java.util.zip.ZipException: too many length or distance symbols at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.PushbackInputStream.read(PushbackInputStream.java:186) at com.google.cloud.dataflow.sdk.runners.worker.TextReader$ScanState.readBytes(TextReader.java:261) at com.google.cloud.dataflow.sdk.runners.worker.TextReader$TextFileIterator.readElement(TextReader.java:189) at com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader$FileBasedIterator.computeNextElement(FileBasedReader.java:265) at com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader$FileBasedIterator.hasNext(FileBasedReader.java:165) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:169) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:118) at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:66) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:204) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:151) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:118) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Searching the Internet for the same ZipException will only result in this answer :
Zip file errors often occur when a hot deployer tries to deploy an application before it is fully copied to the deployment directory. This is quite common if it takes a few seconds to copy a file. The solution is to copy the file to a temporary directory on the same disk partition as the application server, and then move the file to the deployment directory.
Has anyone else encountered a similar exception? Or can we fix this problem anyway?
source share