API with NestJS #99. Scaling the number of application instances with Amazon ECS

March 13, 2023

This entry is part 99 of 121 in the API with NestJS

In recent articles, we’ve learned how to deploy our NestJS application with Docker and AWS. In addition, we’ve managed to deploy multiple instances of our API using a load balancer. Still, it was always a fixed number of instances. In this article, we learn how to adapt to traffic by changing the number of deployed resources.

A single instance of our application might handle the load on most days. However, we must prepare for events that might increase traffic, such as Black Friday. One of the ways to solve this problem is scaling vertically by adding memory or faster CPUs to our existing server. This is not easy to achieve without downtime, though.

An alternative is scaling horizontally by deploying more instances of our application and distributing the traffic between them. Then, if one of the instances malfunctions, we can effortlessly distribute the workload to other instances. Also, it’s easier to accomplish without downtime.

We must watch out for increasing the bill because additional resources cost money. Fortunately, we can scale our resources automatically based on the traffic. Based on various metrics, we can predict if our existing application instances are reaching their limit and act accordingly.

Cluster Auto Scaling

In one of the previous parts of this series, we learned how to deploy multiple instances of our API using a load balancer. When doing so, we defined the Auto Scaling Group for our cluster.

The Auto Scaling Group contains a set of EC2 instances. The number of instances depends on the capacity we set when configuring the group.

In the mentioned article, we set both the minimum and maximum capacity to 2. Because of that, our cluster always launches two EC2 instances. So instead, let’s create a cluster with a broader range of available clusters.

If you want to know moure about the basics of creating clusters with Elastic Container Service, check out the following articles:

API with NestJS #93. Deploying a NestJS app with Amazon ECS and RDS

API with NestJS #94. Deploying multiple instances on AWS with a load balancer

Thanks to that, our cluster can:

scale out by adding EC2 instances to match the increased traffic,
scale in by removing EC2 instances to match the decreased traffic.

When we want to run our NestJS application in the cluster, we do so by running a task. Thanks to Cluster Auto Scaling, whenever we try running a task that our existing EC2 instances can’t handle, the cluster scales out by adding instances.

If the EC2 instances in our cluster are underutilized, the cluster scales in by removing instances as long as it does not disrupt running tasks.

Creating the ECS service

When we run the specified number of tasks in our ECS cluster, we do that through the ECS service. We have done it multiple times so far in this series, but we need to make a few adjustments for it to scale.

Compute configuration

When creating the ECS service that integrates with the cluster auto scaling, an essential thing is configuring the compute options correctly.

When we configured the cluster auto-scaling for our cluster, we created a capacity provider under the hood. A capacity provider manages the scaling of infrastructure for the tasks in our cluster.

Since we didn’t use cluster auto scaling in the previous parts of this series, we chose the “Launch type” compute options. This time, we need to select the capacity provider strategy.

Above, we can see that our capacity provider is chosen by default.

Service auto scaling

When we create the service, we have a chance to set up service auto scaling. Service auto scaling fundamentally differs from the cluster auto scaling we’ve set up before.

When we run one task in our cluster, it deploys one instance of our NestJS application on one of our EC2 instances. Thanks to service auto scaling, we can adjust the number of tasks based on traffic.

When our service tries to run a new task to deploy an additional instance of our NestJS application, there is a chance that we don’t have any free EC2 instances that can handle it. This is where the cluster auto scaling kicks in to run an additional EC2 instance in our cluster.

We can set up the service auto scaling when creating our service. The first thing to do is to set up the minimum and maximum number of tasks that our service handles.

Because of the above configuration, if there is not much traffic, we only run one instance of our NestJS application. However, as the traffic increases, we can run up to five separate instances of our app.

The second thing we need to do is to define a point where our existing NestJS instances are reaching their limit. We do that by defining the scaling policy.

Defining the scaling policy

The new Elastic Container Service interface allows us to set the target tracking policy type. With it, we increase and decrease the number of tasks our service runs based on a specific metric. When we select one of the possible metrics, Amazon ECS creates the CloudWatch alarms. The scaling policy manages the number of tasks to keep the metric close to the specified value. AWS collects metrics in one-minute intervals.

If you want to know more about CloudWatch, check out API with NestJS #97. Introduction to managing logs with Amazon CloudWatch

Let’s go through all three metrics we can choose.

ECSServiceAverageCPUUtilization

The first metric we can choose is ECSServiceAverageCPUUtilization. It measures the average CPU utilization of all the tasks in our service.

ECSServiceAverageMemoryUtilization

Another available metric is ECSServiceAverageMemoryUtilization. It monitors the average memory utilization of tasks in our service.

ALBRequestCountPerTarget

The last possible metric is ALBRequestCountPerTarget. It measures the number of requests completed by each target in our group. If the average number of requests per target during the previous minute is bigger than the specified value, our service can scale out and increase the number of instances.

Target value

After choosing the metric, we must select the value our policy should maintain.

When we choose ECSServiceAverageCPUUtilization or ECSServiceAverageMemoryUtilization, the value represents the utilization in percentage.

Scale-out cooldown period

As mentioned in this article, scaling out means adding more resources to handle the increased traffic.

By specifying the scale-out cooldown period, we can configure the cooldown after adding a new instance to our group. During this time, no new instances will be added.

The above period is measured in seconds.

Scale-in cooldown period

In this article, we’ve mentioned that scaling in means removing resources to match the decreased traffic.

When configuring the scale-in cooldown period, we specify the cooldown after removing instances from our group. During the cooldown period, no instances will be removed.

Optionally, we can disable scale-in to ensure that the number of our instances never decreases.

Viewing the metrics

One way to view the metrics is to open the “Health and metrics” tab in the dashboard displaying the details of our service.

To inspect our metric alarms, we can go to the “All alarms” page on the CloudWatch page.

For the sake of demonstration, I’ve set the ALBRequestCountPerTarget policy with a very low target value to observe the auto-scaling. We can see that one of our metrics is in the “in alarm” state. Let’s inspect it closer.

Because I’ve made multiple requests to the API I deployed, the alarm was triggered. Let’s go to the “Deployments and events” tab in our service and see what happened.

Since we set up the compute configuration of our service correctly, the cluster created additional EC2 instances to be able to run the new tasks. As a result, we can see that the registered container instances have been increased from 1 to 3.

Summary

In this article, we’ve learned about two different types of auto scaling in Amazon AWS. Thanks to cluster auto scaling, we were able to scale the number of EC2 instances based on the running tasks.

By configuring the service auto scaling, we’ve managed to scale the number of tasks running in our service based on the traffic. If the cluster can’t run the increased number of tasks, it runs additional EC2 instances to handle the load thanks to cluster auto scaling.

There is still more to know to master NestJS deployments with AWS, so stay tuned!

Series Navigation<< API with NestJS #98. Health checks with Terminus and Amazon ECSAPI with NestJS #100. The HTTPS protocol with Route 53 and AWS Certificate Manager >>