‍

DogOps: The Inventory

Kyle Rockman

September 13, 2022

As we mentioned in the first post of this blog series for #DogOps, when registering services to Circular, your initial task is to build your service catalog. After you have inventoried your services and their metadata, you can begin to locate any existing holes and areas for improvement.

Today we'll cover the different ways in which we register services to Circular, and the reasons why we choose these methods.

By Hand

We registered our first service in Circular by hand using the Rails App. We filled out all the applicable metadata by hand. That metadata includes details like: the language and framework used, the responsible team (in this case, engineering), the aliases we use in our systems, and some quick links to tools like our Datadog dashboard and PagerDuty rotation.

When starting out with OpsLevel, we recommend registering your first few services by hand, in order to familiarize yourself with the possible metadata.

I can already hear you screaming, "WHY WOULD YOU DO THAT??!!!"

In our opinion, by-hand registration encourages you to consider the available locations for collecting data from within your existing infrastructure and tools. As your understanding of OpsLevel grows, you will more easily understand how our automated collection methods can map data to a service.

Often, customers who use Kubernetes will jump right into the Kubernetes sync. They end up having a hard time because their resources don't have the necessary metadata to derive an owner or lifecycle for a service. Instead, they might have been better served using our CLI and a bash for loop.

opslevel.yml

As our company grew, so did our number of engineering teams. We had a larger amount of things we wanted to register in Circular and track.

In our last post, we listed the various things we register as a service to Circular. The main reason driving the increase was a need for tracking ownership and maturity of each thing individually.

We choose to register things that do not have a long-lived process component via opslevel.yml. For example, our CLI and our Terraform provider will never run in our infrastructure, but we still want to track and improve them. Therefore, they count as services. We will delve into more detail about how we do this in a future post focused on the maturity rubric and campaigns.

We also treat the “main” branch as the current source of truth and cut our releases from that branch when we are ready. This means that we can easily change any of the metadata with a simple pull request, which gives us git history on the changes to that metadata.

Kubernetes Sync

At OpsLevel we run almost everything in Kubernetes clusters. Those clusters include applications like Nginx ingress controller, cert-manager, the rails application and sidekiq workers.

We use Flux - the gitops toolkit - to apply our Kubernetes manifests to all our different clusters. This process leaves us with the question: should we just drop an opslevel.yml file in there too? Or is there a better way?

In this case, we opted to use our Kubernetes sync for registering and reconciling deployment resources that appear in our cluster. This is beneficial in that when someone goes around the gitops flow and manually creates a deployment in one of our clusters, it shows up in Circular. A regular audit of "unowned" services in OpsLevel now helps us head off the question of "what is this thing" 6 months later.

This process also helped us discover an observability problem within our product. How is one notified of new services being created? We took this feature request back to the product team as a user experience improvement.

There is a downside to using the Kubernetes sync. We have to populate the metadata that would normally be in the opslevel.yml directly onto the deployment resources as annotations. Inspecting the data works well when all you have is kubectl. But this process also moves the data out of the main services code repository into our flux repository, where all our Kubernetes manifests live. We deemed this trade-off to be acceptable for the time being and we might re-evaluate this in the future.

We are currently experimenting with using a hybrid approach where some fields like description, language and framework come from opslevel.yml and tags & tools come from Kubernetes. The side where this shines is when you deploy something you don't own the code repository for such as cert-manager. Now we can just adjust the annotations on the manifests we pulled from the cert-manager project.

In the end, the Kubernetes sync is a great way to bootstrap a large number of services, even if it often lacks the necessary metadata for a good service catalog. You can double down on Kubernetes annotations and even bake them into a template, so that all future services get automatically registered and all the necessary metadata is collected. It’s is a larger effort, but it pays off in the long run as you democratize the creation of new services.

Terraform

Lastly, we use Terraform to provision our Kubernetes clusters. We have recently taken to using our Terraform provider to register each cluster as a service, so we can track relevant metadata and PagerDuty alerts about our clusters health.

In the long term, we envision having many more Kubernetes clusters across different regions. The prospect of these additional clusters triggered us to begin leveraging the Terraform provider in a preemptive step to set us up for success as we scale and build a global customer base that requires many more Kubernetes cluster.

In our case, Terraform was the best place to get and set the metadata for each cluster, and our Terraform provider easily facilitated this process. Because we use a Terraform module for Kubernetes cluster creation, any time we spin out a new cluster it is automatically registered correctly with all the metadata.

Bonus - Tags

As you can see, we use a variety of methods to register services. As such, allow us to make one key recommendation for long-term success as your catalog grows. Use tags! Ensure that you use a standard tag key to hold a value which denotes the importance of the service. That way, in the future, you'll know exactly where to look to change any of the metadata.

If you saw this tag on a service you would instantly know that to change the lifecycle field I need to go dig around in the Terraform configuration. This learning did lead us to give product feedback that this kind of functionality be built into the productive natively so keep your eyes peeled for this small feature popping up in the near future.

Until Next Time

We look forward to continuing this series and we hope you enjoy and find the content useful as you continue your service maturity journey at your company.