Elasticsearch 2.4 nodes do not form a cluster with ConnectTransportException

Question

Elasticsearch 2.4 nodes do not form a cluster with ConnectTransportException

I am already launching an ELK stack with Elasticsearch (ES) 1.7 with a docker container with three nodes, each of which launches a single ES container running on the nginx server. Now I am trying to upgrade ES to 2.4.0. The root user is not allowed in ES 2.4.0, so I use the -Des.root.insecure.allow=true option.

 #Pulling SLES12 thin base image FROM private-registry-1 #Author MAINTAINER xyz # Pre-requisite - Adding repositories RUN zypper ar private-registry-2 RUN zypper --no-gpg-checks -n refresh #Install required packages and dependencies RUN zypper -n in net-tools-1.60-764.185 wget-1.14-7.1 python-2.7.9-14.1 python-base-2.7.9-14.1 tar-1.27.1-7.1 #Downloading elasticsearch executable ENV ES_VERSION=2.4.0 ENV ES_CLUSTER_NAME=ccs-elasticsearch ENV ES_DIR="//opt//log-management//elasticsearch" ENV ES_DATA_PATH="//data" ENV ES_LOGS_PATH="//var//log" ENV ES_CONFIG_PATH="${ES_DIR}//config" ENV ES_REST_PORT=9200 ENV ES_INTERNAL_COM_PORT=9300 WORKDIR /opt/log-management RUN wget private-registry-3/elasticsearch/elasticsearch/${ES_VERSION}.tar/elasticsearch-${ES_VERSION}.tar.gz --no-check-certificate RUN tar -xzvf ${ES_DIR}-${ES_VERSION}.tar.gz \ && rm ${ES_DIR}-${ES_VERSION}.tar.gz \ && mv ${ES_DIR}-${ES_VERSION} ${ES_DIR} \ && cp ${ES_DIR}/config/elasticsearch.yml ${ES_CONFIG_PATH}/elasticsearch-default.yml #Exposing elasticsearch server container port to the HOST EXPOSE ${ES_REST_PORT} ${ES_INTERNAL_COM_PORT} #Removing binary files which are not needed RUN zypper -n rm wget # Removing zypper repos RUN zypper rr caspiancs_common COPY query-crs-es.sh ${ES_DIR}/bin/query-crs-es.sh RUN chmod +x ${ES_DIR}/bin/query-crs-es.sh COPY query-crs-wrapper.py ${ES_DIR}/bin/query-crs-wrapper.py RUN chmod +x ${ES_DIR}/bin/query-crs-wrapper.py ENV CRS_PARSER_PYTHON_SCRIPT="${ES_DIR}//bin//query-crs-wrapper.py" #Copy elastic search bootstrap script COPY elasticsearch-bootstrap-and-run.sh ${ES_DIR}/ RUN chmod +x ${ES_DIR}/elasticsearch-bootstrap-and-run.sh COPY config-es-cluster ${ES_DIR}/bin/config-es-cluster RUN chmod +x ${ES_DIR}/bin/config-es-cluster COPY elasticsearch-config-script ${ES_DIR}/bin/elasticsearch-config-script RUN chmod +x ${ES_DIR}/bin/elasticsearch-config-script #Running elasticsearch executable WORKDIR ${ES_DIR} ENTRYPOINT ${ES_DIR}/elasticsearch-bootstrap-and-run.sh

The configuration file will be changed to elasticsearch-config and config-es-cluster mentioned in the Docker file as follows:

 #Bootstrap script to configure elasticsearch.yml file echo "cluster.name: ${ES_CLUSTER_NAME}" > ${ES_CONFIG_PATH}/elasticsearch.yml echo "path.data: ${ES_DATA_PATH}" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "path.logs: ${ES_LOGS_PATH}" >> ${ES_CONFIG_PATH}/elasticsearch.yml #Performance optimization settings echo "index.number_of_replicas: 1" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "index.number_of_shards: 3" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "discovery.zen.ping.multicast.enabled: false" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "bootstrap.mlockall: true" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "indices.memory.index_buffer_size: 50%" >> ${ES_CONFIG_PATH}/elasticsearch.yml #Search thread pool echo "threadpool.search.type: fixed" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "threadpool.search.size: 20" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "threadpool.search.queue_size: 100000" >> ${ES_CONFIG_PATH}/elasticsearch.yml #Index thread pool echo "threadpool.index.type: fixed" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "threadpool.index.size: 60" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "threadpool.index.queue_size: 200000" >> ${ES_CONFIG_PATH}/elasticsearch.yml #publish host as container host address #echo "network.publish_host: ${CONTAINER_HOST_ADDRESS}" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "network.bind_host: ${CONTAINER_HOST_ADDRESS}" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "network.publish_host: ${CONTAINER_PRIVATE_IP}" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "network.bind_host: ${CONTAINER_PRIVATE_IP}" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "network.host: ${CONTAINER_HOST_ADDRESS}" >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "network.host: 0.0.0.0" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "htpp.port: 9200" >> ${ES_CONFIG_PATH}/elasticsearch.yml #echo "transport.tcp.port: 9300-9400" >> ${ES_CONFIG_PATH}/elasticsearch.yml #configure elasticsearch.yml for clustering echo 'discovery.zen.ping.unicast.hosts: [ELASTICSEARCH_IPS] ' >> ${ES_CONFIG_PATH}/elasticsearch.yml echo "discovery.zen.minimum_master_nodes: 1" >> ${ES_CONFIG_PATH}/elasticsearch.yml

ELASTICSEARCH_IPS is an array of IP addresses of other nodes that is obtained by all nodes running a script called query-crs-es.sh . Ultimately, the array will have the IP addresses of the other two nodes in the cluster. Note that they will be node IP, not the container’s private IP addresses.

When I try to start the container, I use ansible . During startup, all nodes stand up but do not form a cluster. I consistently get this error

Node1:

 [2016-10-07 09:45:23,313][WARN ][bootstrap ] running as ROOT user. this is a bad idea! [2016-10-07 09:45:23,474][INFO ][node ] [Dragon Lord] version[2.4.0], pid[1], build[ce9f0c7/2016-08-29T09:14:17Z] [2016-10-07 09:45:23,474][INFO ][node ] [Dragon Lord] initializing ... [2016-10-07 09:45:23,970][INFO ][plugins ] [Dragon Lord] modules [reindex, lang-expression, lang-groovy], plugins [], sites [] [2016-10-07 09:45:23,994][INFO ][env ] [Dragon Lord] using [1] data paths, mounts [[/data (/dev/mapper/platform-data)]], net usable_space [2.5tb], net total_space [2.5tb], spins? [possibly], types [xfs] [2016-10-07 09:45:23,994][INFO ][env ] [Dragon Lord] heap size [989.8mb], compressed ordinary object pointers [true] [2016-10-07 09:45:24,028][WARN ][threadpool ] [Dragon Lord] requested thread pool size [60] for [index] is too large; setting to maximum [32] instead [2016-10-07 09:45:25,540][INFO ][node ] [Dragon Lord] initialized [2016-10-07 09:45:25,540][INFO ][node ] [Dragon Lord] starting ... [2016-10-07 09:45:25,687][INFO ][transport ] [Dragon Lord] publish_address {172.17.0.15:9300}, bound_addresses {[::]:9300} [2016-10-07 09:45:25,693][INFO ][discovery ] [Dragon Lord] ccs-elasticsearch/5wNwWJRFRS-2dRY5AGqqGQ [2016-10-07 09:45:28,721][INFO ][cluster.service ] [Dragon Lord] new_master {Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}, reason: zen-disco-join(elected_as_master, [0] joins received) [2016-10-07 09:45:28,765][INFO ][http ] [Dragon Lord] publish_address {172.17.0.15:9200}, bound_addresses {[::]:9200} [2016-10-07 09:45:28,765][INFO ][node ] [Dragon Lord] started [2016-10-07 09:45:28,856][INFO ][gateway ] [Dragon Lord] recovered [20] indices into cluster_state

Node2:

 [2016-10-07 09:45:58,561][WARN ][bootstrap ] running as ROOT user. this is a bad idea! [2016-10-07 09:45:58,729][INFO ][node ] [Defensor] version[2.4.0], pid[1], build[ce9f0c7/2016-08-29T09:14:17Z] [2016-10-07 09:45:58,729][INFO ][node ] [Defensor] initializing ... [2016-10-07 09:45:59,215][INFO ][plugins ] [Defensor] modules [reindex, lang-expression, lang-groovy], plugins [], sites [] [2016-10-07 09:45:59,237][INFO ][env ] [Defensor] using [1] data paths, mounts [[/data (/dev/mapper/platform-data)]], net usable_space [2.5tb], net total_space [2.5tb], spins? [possibly], types [xfs] [2016-10-07 09:45:59,237][INFO ][env ] [Defensor] heap size [989.8mb], compressed ordinary object pointers [true] [2016-10-07 09:45:59,266][WARN ][threadpool ] [Defensor] requested thread pool size [60] for [index] is too large; setting to maximum [32] instead [2016-10-07 09:46:00,733][INFO ][node ] [Defensor] initialized [2016-10-07 09:46:00,733][INFO ][node ] [Defensor] starting ... [2016-10-07 09:46:00,833][INFO ][transport ] [Defensor] publish_address {172.17.0.16:9300}, bound_addresses {[::]:9300} [2016-10-07 09:46:00,837][INFO ][discovery ] [Defensor] ccs-elasticsearch/RXALMe9NQVmbCz5gg1CwHA [2016-10-07 09:46:03,876][WARN ][discovery.zen ] [Defensor] failed to connect to master [{Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}], retrying... ConnectTransportException[[Dragon Lord][172.17.0.15:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /172.17.0.15:9300]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:1002) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:937) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:911) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:260) at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:444) at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:396) at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96) at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /172.17.0.15:9300 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more [2016-10-07 09:46:06,899][WARN ][discovery.zen ] [Defensor] failed to connect to master [{Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}], retrying... ConnectTransportException[[Dragon Lord][172.17.0.15:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /172.17.0.15:9300]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:1002) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:937) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:911) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:260) at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:444) at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:396) at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96) at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /172.17.0.15:9300 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more [2016-10-07 09:46:09,917][WARN ][discovery.zen ] [Defensor] failed to connect to master [{Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}], retrying... ConnectTransportException[[Dragon Lord][172.17.0.15:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /172.17.0.15:9300];

node3:

 [2016-10-07 09:45:58,624][WARN ][bootstrap ] running as ROOT user. this is a bad idea! [2016-10-07 09:45:58,806][INFO ][node ] [Scarlet Beetle] version[2.4.0], pid[1], build[ce9f0c7/2016-08-29T09:14:17Z] [2016-10-07 09:45:58,806][INFO ][node ] [Scarlet Beetle] initializing ... [2016-10-07 09:45:59,341][INFO ][plugins ] [Scarlet Beetle] modules [reindex, lang-expression, lang-groovy], plugins [], sites [] [2016-10-07 09:45:59,363][INFO ][env ] [Scarlet Beetle] using [1] data paths, mounts [[/data (/dev/mapper/platform-data)]], net usable_space [2.5tb], net total_space [2.5tb], spins? [possibly], types [xfs] [2016-10-07 09:45:59,363][INFO ][env ] [Scarlet Beetle] heap size [989.8mb], compressed ordinary object pointers [true] [2016-10-07 09:45:59,390][WARN ][threadpool ] [Scarlet Beetle] requested thread pool size [60] for [index] is too large; setting to maximum [32] instead [2016-10-07 09:46:00,795][INFO ][node ] [Scarlet Beetle] initialized [2016-10-07 09:46:00,795][INFO ][node ] [Scarlet Beetle] starting ... [2016-10-07 09:46:00,927][INFO ][transport ] [Scarlet Beetle] publish_address {172.17.0.16:9300}, bound_addresses {[::]:9300} [2016-10-07 09:46:00,931][INFO ][discovery ] [Scarlet Beetle] ccs-elasticsearch/SFWrVwKRSUu--4KiZK4Kfg [2016-10-07 09:46:03,965][WARN ][discovery.zen ] [Scarlet Beetle] failed to connect to master [{Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}], retrying... ConnectTransportException[[Dragon Lord][172.17.0.15:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /172.17.0.15:9300]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:1002) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:937) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:911) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:260) at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:444) at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:396) at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96) at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /172.17.0.15:9300 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more [2016-10-07 09:46:06,990][WARN ][discovery.zen ] [Scarlet Beetle] failed to connect to master [{Dragon Lord}{5wNwWJRFRS-2dRY5AGqqGQ}{172.17.0.15}{172.17.0.15:9300}], retrying... ConnectTransportException[[Dragon Lord][172.17.0.15:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /172.17.0.15:9300]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:1002) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:937) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:911) at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:260) at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:444) at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:396) at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96) at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

As you can see from the logs, node 2 and 3 know about master, Node1, but cannot connect. I tried most of the configurations about network.host that you can see in the configuration code, and none of them work. Any findings would be appreciated.

This is the state of the ports:

  netstat -nlp | grep 9200 tcp 0 0 10.240.135.140:9200 0.0.0.0:* LISTEN 188116/docker-proxy tcp 0 0 10.240.137.112:9200 0.0.0.0:* LISTEN 187240/haproxy netstat -nlp | grep 9300 tcp 0 0 :::9300 :::* LISTEN 188085/docker-proxy

0

docker nginx dockerfile elasticsearch elk-stack

vvs14 Oct 7 '16 at 10:25

source share

1 answer

vvs14 · Accepted Answer · 2016-10-12T08:04:13+0000

I managed to create a cluster with the following settings

network.publish_host=CONTAINER_HOST_ADDRESS i.e. The node address where the container is running.

 network.bind_host=0.0.0.0 transport.publish_port=9300 transport.publish_host=CONTAINER_HOST_ADDRESS

tranport.publish_host is important when you run ES behind a proxy / load balancer like nginx or haproxy.

Elasticsearch 2.4 nodes do not form a cluster with ConnectTransportException

More articles: