A Detailed Guide to How to Scale Microservices

You’ve taken on the microservices beast — congratulations! You’ve split up your services to do one thing well, but somehow, they just don’t scale automatically. And sadly, scaling doesn’t come automatically. You need to ensure your architecture is set up to scale correctly and appropriately for what you’re doing.

In this post, we’ll go through things to consider when you’re trying to scale microservices and how one would architect a system to do just that.

The primary goal of scaling out microservices: getting resources to different parts of the system that need it.

What Are the Goals of Scaling Microservices?

Now that you have your architecture split up into different services, you want to ensure that the services are isolated in their responsibilities. You do this not only to make certain that your services are doing their one thing well but also so that you’re able to scale.

You can scale out different parts of the system at different times. For instance, let’s say you have an authentication service that makes sure people can log in to the system. And you have an invoice service that can report back invoices for customers. Now the authentication service will clearly get more requests at any one time than the invoice service because it’s used across a lot of different parts of the system, while the invoice system is only used in certain situations. This is the best outcome you can have with microservices: separated responsibilities.

This is the primary goal of scaling out microservices: getting resources to different parts of the system that need it. Because resources are finite in any system, it’s best to give the resources to the parts of the systems that need it and not over- or underutilize any part of those resources. With a monolithic app, it’s difficult to achieve those kinds of results. You get to just throw resources at one app and hope it’s enough. But with microservices, this gives the ability to target the resources where they’re needed most and can also help you achieve your operational goals.

The Different Ways of Scaling Microservices

With a microservices architecture, there are a few ways one can scale out the infrastructure. There are two primary types of scaling: vertical and horizontal. The one you use will depend on the situation.

Let’s talk about vertical scaling first. Vertical scaling is when you give more resources to the individual hosts / containers of a service (e.g, more CPU or memory). In theory, more resources to individual hosts means they can each respond to more requests per second and achieve higher concurrency. Depending on your architecture and where you deploy to (i.e., cloud vs. on-prem), that could either be as easy as flipping a switch or require more thought and attention from an operations team. Scaling CPU and memory is usually an easier endeavor. However, sometimes just adding CPU and memory won’t be enough. There are limits to how much a single application can use. Then there are also limits on what the platform can offer. While this is usually the first way to scale things out, there are some limits you can hit fairly fast.

An alternative to vertical scaling is horizontal scaling.. This means adding more units / hosts of a single service. This type of scaling is definitely more difficult to achieve as it requires architecting your service to utilize multiple hosts concurrently. However, this is where microservices can shine as you see hot spots in your system. You can target what parts of the system need to be scaled out. And depending on the platform you’re on, this could be just as easy as adding another pod in Kubernetes, for example.


Architecture Considerations to Ensure You Can Scale Out

To achieve horizontal scaling, you need to ensure that your services are isolated enough that they can be scaled out appropriately. This doesn’t mean just isolated in the overall architectural sense. This also means isolated as individual processes as well. Each service can store state inside of its own memory, or something that can’t be shared easily across a number of different services. When this happens, disaster is bound to show up because each request that comes in doesn’t always get routed to the same service instance.

For example, let’s say you have a price service that will give back information about the price of your products. Due to some tax laws and other situations, you have a fairly complex price calculation you have to do. This makes price look-ups potentially slow, so you decide to cache this calculated final price. The first attempt at this is caching the price in the memory of the local running process. However, because this service is used a lot, you notice that this service still isn’t performing as well because all of your services are now doing this calculation. So even though you’re adding more and more services to your cluster, you aren’t achieving the performance goals you were expecting. To alleviate this, you need to ensure that you share these calculations across the instances of your service. Usually, this means adding shared caching between service instances.

One of the hardest areas to get correct is the concurrency of services.

Getting Concurrency Right

Now that you’re starting to scale out, you need to ensure that things are working out correctly. One of the hardest areas to get correct is the concurrency of services. As in the last example, you were starting to add shared caching between services. You need to check that services aren’t going to collide with each other as they get scaled out. A data governance model and who owns what data are very important to make certain that each service is isolated as a whole, and each individual service knows what they can and can’t do. One step toward this type of isolation is ensuring that each service owns its data. This means that there are no shared databases between services.

No shared databases is almost always one of the first steps toward microservices. Scaling out the architecture will be difficult if your service’s data is not isolated correctly behind appropriate interfaces. This can include data collisions or, worse, bad data generated because of not knowing who owns what data. Getting your data model right and making sure that each service is isolated will also ensure that as you scale out your architecture, your data will scale as well.

Tools to Assist in Ensuring That You’re Scaling Correctly

As you start to scale out your system, you need to be mindful of everything that’s going on. While this isn’t always easy to achieve, there are tools out there now that assist you in having visibility across the system. Usually, this means dashboards that help you see all parts of the system and then also have prominent things to watch out for right up front.

OpsLevel can help you in this area. OpsLevel can show you all parts of your system at a glance. It ensures that your systems are working as expected and can highlight areas that need more consideration. It gives you the single pane of glass you need to see what’s going on in your system and make sure that things are working out correctly.

The Four Golden Signals: latency, errors, traffic, and saturation.

What Metrics to Track Correctly

With that monitoring in place, now you need to get the right metrics to track.

The first metrics to track are fairly easy: CPU, memory, and disk. Knowing how much of each of those resources is being used is important, as well as how much of each resource is being fully utilized. This gives you opportunities for how to best ensure that you allocate enough resources to the right places. Then, as your system scales, you can predict how much more resources you’ll need in the future.

The next bit of information that starts easy, then becomes a little bit more difficult, is how much traffic is going to each service. The first metric to track is just raw network traffic. This helps you quickly identify hot spots across your services. However, you’ll quickly shift to wanting to know what exactly is inside of that network traffic. So the next thing to do is get more granular in that network traffic.

The next step in microservice monitoring is The Four Golden Signals: latency, errors, traffic, and saturation. Saturation is a measure of how “full” your service is and can be captured with CPU, memory, network, and disk metrics. Latency, Errors, and Traffic are closely related to the RED metrics: rate, errors, duration. Specifically:

  • Rate: Number of requests per second
  • Errors: Percentage / number of failing requests
  • Duration: How long those requests take


Hopefully, this post helps you out on your microservices journey. Scaling out an architecture isn’t always the easiest, most straightforward thing to do. But this guide will give you ideas of what to look at in your architecture, a tool and metrics to ensure that you are successful, and most of all, the ability to hit whatever goals lie ahead.

This post was written by Erik Lindblom. Erik has been a full stack developer for the last 13 years. During that time, he’s tried to understand everything that’s required to deliver high quality, valuable software.

Learn how to grow your microservice architecture without the chaos.

Not ready for a demo? Stay in the loop with our newsletter.