GCP first offers two types of load balancers, namely Network and HTTP (s) load balancers. You can find the differences between Network Vs HTTP (s) load balancer here . From now on, I will refer to HTTP load balancing as HLB (it is too long).
Here is the HLB architecture taken from GCP :

As you saw from the above architecture, there are many moving parts to customizing HLB. Now we are going to build it on the reverse side, starting with the Instance template, the Instance group, and the forwarding rules.
1. Instance template: Although the instance template is not needed to configure the HLB, since you can even use an unmanaged instance group and join it to the HLB. I prefer to have an instance template and a group of managed instances, so an auto-scale function can also be added for a group of homogeneous instances. I wrote the steps for creating the instance template here .
Lets assume the instance template to be sample-template.
2. Managed Instance Group: Read the steps to create a managed instance group and autoscan here . Autoscaling and load balancing are independent of each other. Both offer a health check. From my point of view, setting up a health check for both load balancing and autoscaling is redundant, and I think that a health check only for load balancing will be good.
Lets assume the managed instance group to be autoscale-managed-instance group, which is autoscaled and created based on sample-template.
3. Service endpoint:. You must specify the service endpoint that the HLB will use. Command to configure the service endpoint in gcloud :
gcloud compute instance-groups managed \ set-named-ports \ autoscale-managed-instance group \
The above command creates a service endpoint in an instance group that helps the HLB communicate with homogeneous instances in the group.
4. Health Check: This is important to ensure that the HLB directs traffic only to healthy instances. Command to create a health check:
gcloud compute http-health-checks create sample-health-check
5. Backend services:. This service routes traffic to all backend instances of ie instances in the instance group. It also links instance health checks and allows traffic to be routed only to healthy instances. The command to create the backend service:
gcloud compute backend-services create \ sample-backend-service \
The above command creates a backend service and communicates with the health check created in the previous step. Now adding the group of instances to the server service by running the following command:
gcloud compute backend-services add-backend sample-backend-service \ --instance-group \ sample-managed-instance-group \ --balancing-mode RATE \ --max-rate-per-instance 100 \ --instance-group-region asia-northeast1
The above command attaches a group of instances to the backend service, and also distributes the load based on the number of requests using the balancing flag. max-rate-per-instance is used by the auto-scaler if you set the auto-scaler policy based on the use of load balancing.
6. Map URL:. Create a URL map that displays the URL of the HTTP request to your server service. Command to create a URL map:
gcloud compute url-maps create sample-map \
When you select a Content-Based Load Balancer , you need to add many entries to the URL maps so that it redirects the request to the appropriate server service.
7. Target HTTP proxy: This step associates the target proxy with the URL map. The command to create the target HTTP proxy is:
gcloud compute target-http-proxies \ create sample-target-proxy \
8. Forwarding Rules: This is the last step that gives the global external IP address for the HLB.
gcloud compute forwarding-rules \ create sample-forward \
Accessing the HLB IP address in the browser now gives the web page served by the instances in the instance group. It finally sets up a highly scalable web application that automatically scales and balances the load.
This is part 3 of a 3-part series about creating an autoscaled, load-balanced backend.