Access HTTP service running in GKE from Google Dataflow

I have an HTTP service running in a Google Container Engine cluster (behind a Cuban service).

My goal is to access this service from a data flow job running in the same GCP project using a fixed name (just like services can be reached from within GKE using DNS). Any idea?


  • Most of the solutions I read on stackoverflow rely on kube proxy installed on computers trying to contact the service. As far as I know, it is not possible to reliably configure this service for each instance of the work object created by Dataflow.
  • One option is to create an external balancer and create an A record in the public DNS. Although it works, I would rather not have a record in my public DNS reports pointing to this service.
+5
google-container-engine google-cloud-dataflow
source share
3 answers

The Dataflow task running on GCP will not be part of the Google Container Engine cluster, so by default it will not have access to the DNS of the internal cluster.

Try setting up load balancing for the service that you want to expose, who knows how to direct "external" traffic to it. This will allow you to connect to the IP address directly from the data flow job running on the GCP.

0
source

Lukash's answer is probably the easiest way to expose your service for data flow. But if you really don't want a public IP and DNS record, you can use the GCE route to deliver traffic to your cluster’s own IP range (something like option 1 in this answer ).

This will allow you to hit your stable service IP address. I'm not sure how to get Kubernetes internal DNS to resolve from the data stream.

+1
source

EDIT: Now supported by GKE (now known as Kubernetes Engine): https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing

I implemented this in a rather smooth way IMHO. I will try to briefly trace how this works:

  • Remember that when you create a container cluster (or nodepool), it will consist of a set of GCE instances in the instance group, which is part of the default network. NB: Add a specific GCE network tag so that you can add only those instances to the firewall rule later, to allow you to verify that load balancing is working.
  • This instance group is a regular instance group.
  • Now remember that kubernets have something called NodePort that will open a service on that port on all nodes, i.e. all GCE instances in your cluster. This is what we want!
  • Now that we know that we have a set of GCE instances in the instance group, we can add this instance group to the default load balancer in your network, without requiring to know anything about the internal functions of the kernel or DNS.
  • The guide you can follow by skipping many of the initial steps is here: https://cloud.google.com/compute/docs/load-balancing/internal/
  • Remember that this works for regions, so the data stream and everything else must be in the same region.

See this specification for the service:

 kind: Service apiVersion: v1 metadata: name: name labels: app: app spec: selector: name: name app: app tier: backend ports: - name: health protocol: TCP enter code here port: 8081 nodePort: 30081 - name: api protocol: TCP port: 8080 nodePort: 30080 type: NodePort 

This is the code to configure load balancing with health checks, forwarding rules, and a firewall that must be followed:

 _region=<THE_REGION> _instance_group=<THE_NODE_POOL_INSTANCE_GROUP_NAME> #Can be different for your case _healtcheck_path=/liveness _healtcheck_port=30081 _healtcheck_name=<THE_HEALTCHECK_NAME> _port=30080 _tags=<TAGS> _loadbalancer_name=internal-loadbalancer-$_region _loadbalancer_ip=10.240.0.200 gcloud compute health-checks create http $_healtcheck_name \ --port $_healtcheck_port \ --request-path $_healtcheck_path gcloud compute backend-services create $_loadbalancer_name \ --load-balancing-scheme internal \ --region $_region \ --health-checks $_healtcheck_name gcloud compute backend-services add-backend $_loadbalancer_name \ --instance-group $_instance_group \ --instance-group-zone $_region-a \ --region $_region gcloud compute forwarding-rules create $_loadbalancer_name-forwarding-rule \ --load-balancing-scheme internal \ --ports $_port \ --region $_region \ --backend-service $_loadbalancer_name \ --address $_loadbalancer_ip #Allow google cloud to healthcheck your instance gcloud compute firewall-rules create allow-$_healtcheck_name \ --source-ranges 130.211.0.0/22,35.191.0.0/16 \ --target-tags $_tags \ --allow tcp 
+1
source

All Articles