How to fix Connection reset with peer message from apache-spark?

I often get the following exception, and I wonder why this happens? After researching, I found that I can do .set("spark.submit.deployMode", "nio"); but that didn't work either, and I'm using spark 2.0.0

 WARN TransportChannelHandler: Exception in connection from /172.31.3.245:46014 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) 
+6
source share
1 answer

I was getting the same error even if I tried a lot of things. My work was used to get stuck throwing this error after launching for a very long time. I tried a little work that helped me solve. Although, I still get the same error, at least my work is working fine.

  • One of the reasons may be that the performers kill themselves, thinking that they have lost touch with the owner. I added the following configurations to the spark-defaults.conf file.

    spark.network.timeout 10000000 spark.executor.heartbeatInterval 10000000 basically, I increased the network timeout interval and the heartbeat interval

  • The specific step that was used for the jam, I just cached the data frame that was used for processing (the step that was used for the jam).

Note. - This is work around, I still see the same error in the error logs, but my work does not stop.

+2
source

All Articles