Validating Kubernetes Best Practices

Validating Kubernetes Best Practices

Kubernetes is great because of its almost limitless configurability. But this configurability makes it hard to ensure that best practices are followed consistently across your cluster.

This article is going to cover 5 Kubernetes best-practices and how you can use OpsLevel to validate them in your cluster. We’ll show you how to push your Kubernetes state into OpsLevel’s using custom event checks and then how OpsLevel’s jq integration can slice and dice that data to provide a pass/fail result for your registered services in OpsLevel.

Table of Contents


Getting Kubernetes data into OpsLevel

There are many mechanisms you could use to send payloads of Kubernetes data to OpsLevel.

We will be working under the assumption you are using kubewatch to submit payloads at our custom event check.

The hardest part of configuring a custom event check is determining for any given Kubernetes resource payload how to derive the OpsLevel service alias. If you are using our Kubernetes integration then you should have by default an alias on every imported service using this format k8s:<metadata.name>-<metadata.namespace>. Because of this all our examples are assumed to be using this "k8s:\(.metadata.name)-\(.metadata.namespace)" for the Service Specifier field which should match the service aliases imported by our Kubernetes integration.

This is also where robust service aliases can play a big role in simplifing this step if you do not have consistent naming in your cluster.

All of the examples in this post have links to working jq playgrounds. It is a place where you can interactively experiment with jq expressions against any given JSON payload. When you find yourself needing more complex expressions I suggest digging into the jq documentation since there are a lot of robust features that you may not expect.

With this assumption out of the way lets start into how to go about validating the 1st best practice example.

Best Practice 1 - Infrastructure Adoption Tracking


User Story

As an infrastructure engineer, I want my organization to migrate to a new Ingress controller without suffering through the long tail of adoption.

Change adoption is a commonly overlooked thing early on in Kubernetes which leads to a long tail of supporting old ways of doing things. OpsLevel can help your organization to adopt a change by enabling you to more easily track the status of that effort. It can even help you determine if pulling the trigger early on the decommission phase is risky depending on what services would end up being affected by that early decommission, possibly saving you time and money.

How do I measure this in Kubernetes?

Ingress resources have an annotation that Ingress controllers look at to determine if the Ingress resource is meant for them. This annotation takes the form of kubernetes.io/ingress.class: "internal" and in this use case you will want to determine if the value is set to internal-v2.

Using kubectl we can get an example ingress resource in JSON format

kubectl get ingress <ingress_name> -o json

Here is an example Kubernetes ingress resource as JSON

{
  "apiVersion": "networking.k8s.io/v1",
  "kind": "Ingress",
  "metadata": {
    "name": "web",
    "namespace": "production",
    "annotations": {
      "kubernetes.io/ingress.class": "internal"
    },
  },
  "spec": {}
}

How do I verify this in OpsLevel?

With an example json payload in hand we can then craft a jq expression which will result in a “true” or “false” value (pass/fail) which opslevel will use to determine the status of the check. The only tricky part is dealing with the exotic key name which thankfully jq makes trivial.

Here is a jq expression that extracts the annotation from an Ingress resource and checks for the desired value of internal-v2.

.metadata.annotations."kubernetes.io/ingress.class" == "internal-v2"

You can play around with the above payload in this jq playground.

Another benefit to OpsLevel checks is crafting a message template for the service owner reading up on their failed checks. We can use this message to precisely describe the problem and the remedy so the service owner can self-service fix the problem the check is looking for. Here is an example message you could use for this check.


{% if check.failed %}  
  ### Check failed
  Ingress `{{ data.metadata.namespace }}/{{ data.metadata.name }}` still uses class `{{ data.metadata.annotations["kubernetes.io/ingress.class"]}}`.

  Please upgrade to ingress class: `internal-v2`
{% else %}
  ### Check passed
{% endif %}

Here is data entered directly into the OpsLevel custom event check configuration UI and also leverages having a more helpful check failure message.

Change Adoption Custom Event Check Part 1 Change Adoption Custom Event Check Part 2

Best Practice 2 - Trusted Image Registry Validation


User Story

As a security engineer, I want to make sure my organization is only using container images from one or more trusted container registries.

This is a very common security stance of only wanting to use things from a trusted source. With this kind of check you can enable your security team to validate across your organization that all of your services are using trusted sources of artifacts.

How do I measure this in Kubernetes?

Looking at the available data in Kubernetes you can retrieve this data from the pod’s list of containers and initContainers. Both of these are lists so for any given pod you will end up with a list of container images to check against. Now you just need to decide from what Kubernetes resources you should inspect: Pod, Deployment, Replicaset, etc. In this example we will target the Deployment resource so lets grab an example json payload from our Kubernetes cluster.

kubectl get deployment <deployment_name> -o json

Here is an example Kubernetes deployment resource as JSON

{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "web",
    "namespace": "production",
    "labels": {
      "app": "web",
      "environment": "prod"
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "app": "web",
        "environment": "prod"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "web",
          "environment": "prod"
        }
      },
      "spec": {
        "initContainers": [
          {
            "name": "startup",
            "image": "gcr.io/busybox:latest"
          }
        ],
        "containers": [
          {
            "name": "web",
            "image": "cilium/echoserver",
            "imagePullPolicy": "Always"
          }
        ]
      }
    }
  }
}

The following is the list we want to build after you filter out all of the other data in the JSON.

["gcr.io/team/service:v1.0", "docker.yourcompany.com/team/service:master", "nginx:latest"]

The following jq expression can be used to validate (pass/fail) the above example lis.

map(.| contains("gcr.io") or contains("docker.yourcompany.com")) | all(.)

How do I verify this in OpsLevel?

Now that we have some of the work parts we just need to pull together a full jq expression. We need to concatenate the initContainers and containers image fields into one list and then check if all entries in that list contain one of the two trusted registries - in our case ["gcr.io", "docker.yourcompany.com"].

[
  ((.spec.template.spec.initContainers[] | .image),
   (.spec.template.spec.containers[] | .image))
] | 
map(.| contains("gcr.io") or contains("docker.yourcompany.com")) | 
all(.)

And here is the jqplay link.

Best Practice 3 - Ensure Percentage Based Rolling Update


User Story

As a site reliability engineer, I want to ensure our tier 1 services use a percentage based rolling update strategy.

When first starting out with Kubernetes most organizations have a static number of replicas they want to run for their deployment. Then they throw on a rolling update strategy tuned for that static number of replicas. Maybe something along the lines of:

spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

If each pod takes around 3 minutes to roll, that means a change would only take about 15 minutes total to be realized. Not bad.

Then in the future, your service hits the front page of Hacker News and starts growing and seeing more traffic. A Horizontal Pod Autoscaler is added into the mix allowing the deployment to scale from 5 pods to 40 pods to handle the higher load.

Now during any change, such as deploying a new container image, the amount of pods running could be as high as 40 causing the total time to roll to be approximately 2 hours. This is where ensuring a deployment is using a percentage based rolling update strategy is very important.

You would be looking to ensure the following is used instead:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: "20%"
      maxUnavailable: 0

How do I measure this in Kubernetes?

As seen above you will be digging into the deployment’s strategy. There are two fields you will need to be check: maxSurge and maxUnavailable. You will also need to deal with the fact that the strategy type may be set to Recreate rather then RollingUpdate.

Here is a jq expression that can handle those edgecases:

if (.spec.strategy.type == "RollingUpdate") then 
  (.spec.strategy.rollingUpdate | 
    (.maxSurge | tostring | contains("%")) or 
    (.maxUnavailable | tostring | contains("%")))
else
  false
end

How do I verify this in OpsLevel?

Remember, the original user story had the requirement of only being applicable to tier 1 services. To achieve this you can use a checklist with a filter applied to it:

Tier 1 Checklist Filter

And here is the jqplay link.

Best Practice 4 & 5 - CPU Quality of Service Validation


User Story

As a service owner with a high impact but also CPU sensitive service, I want to achieve optimum performance while running in the multitenant environment of Kubernetes.

Once you have been running workloads on Kubernetes for a while you will inevitably begin to start digging down into the performance characteristics of your services and may find that your own services are causing issues for each other because of Kubernetes multitenant nature.

This is when you dig deep into the Kubernetes documentation and find out about CPU Quality of Service (QOS) rules and how your Kubernetes manifest for resource requests and limits could actually be hurting you in a multitenant environment.

To summarize Kubernetes CPU Manager Feature and CPU Management Policies:

There are 2 things that drive your ability to have better access to a CPU:

  • Requests & Limits are set to whole integer values
  • Requests & Limits are equivalent

What this effectively means is that you need to constrain yourself to using certain settings for your resource’s requests & limits.

Additionally there are 2 parts to your overall CPU QOS - levels and pools.

Three levels ordered from top to bottom, best to worst:

  • Guaranteed
  • Burstable
  • BestEffort

Two pools of CPU scheduling:

  • Dedicated
  • Shared

There are a lot of nitty gritty details in the Kubernetes documentation on this so I recommend you familiarize yourself with this topic if you are a Kubernetes cluster operator.

How do I measure this in Kubernetes?

In a deployment there is a resources stanza that can be set per container and looks like this:

resources:
  requests:
    cpu: "100m"
  limits:
    cpu: "2"

The meaning of a CPU in this context is specified in either full cores 1 or millicores 100m or 0.1 (1/1000 of a CPU core).

You also need to consider that in any pod spec there could be any number of containers so you will be dealing with an array of values.

Due to the complexity of this topic we will also break it down into 2 separate checks to allow for a robust failure message of each problem to the service owner viewing the failed check so they know how to resolve the issue.

  • Check if the CPU for requests and limits is set to use whole CPUs
  • Check if the CPU requests matches limits

You will also need to deal with the fact that the values come in the form of integers, floating point numbers, and strings like the following:

["1", "0.1", "100m", "1000m"]

Here is a jq expression that, given the above array of values, can determine if it’s a “whole CPU”

map(.| if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end | floor == .)

For the second check you will be creating an array of structures you can compare

[{"requests": "1", "limits": "1000m"},{"requests": "1", "limits": "2000m"}]

Note that in the following jq expression testing for equality you need to convert all the different CPU values to a common denominator so you can accurately compare for equality.

map(
  (.requests | if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end) 
  ==
  (.limits | if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end)
)

How do I verify this in OpsLevel?

Here is the final jq expression for each check where the payload is a Deployment

Requests and limits uses whole CPUs

.spec.template.spec |
[(
  (select(.initContainers != null) | .initContainers | .[].resources),
  (select(.containers != null) | .containers | .[].resources)
)] | 
map(.requests.cpu, .limits.cpu) | 
map(.| 
  if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end | floor == .) | 
all(.)

And here is the jqplay link.

Requests and limits match

.spec.template.spec |
[(
  (select(.initContainers != null) | .initContainers | .[].resources),
  (select(.containers != null) | .containers | .[].resources)
)] | 
map({requests: .requests.cpu, limits: .limits.cpu}) | 
map(
  (.requests |   if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end) 
  == 
  (.limits | if contains("m") then .[0:length-1] | tonumber / 1000 else . | tonumber end)
) | 
all(.)

And here is the jqplay link.

Robust check message

The last piece of the puzzle is writing a nicely formatted message to whomever inspects the status of these checks in the future. OpsLevel checks allow for composing robust messages using markdown and liquid. Due to the target audience of this check being a service owner, it is important to give them the information they need to self-service the problem.

Here is an example message crafted for the Requests and Limits uses whole CPU check.


{% if check.failed %}  
  ### Check failed  
  Deployment `{{ data.metadata.namespace }}/{{ data.metadata.name }}` is a Tier 1 Service that does not specify using whole CPUs for requests & limits.

  It currently uses:
  {%- if data.spec.template.spec contains "initContainers" %}
    {%- assign containers = data.spec.template.spec.initContainers %}
    {%- for container in containers %}

      #### Init Container: `{{ container.name }}`
      ```
      {{ container.resources | jsonify }}
      ```

    {%- endfor %}
  {%- endif %}

  {%- if data.spec.template.spec contains "containers" %}
    {%- assign containers = data.spec.template.spec.containers %}
    {%- for container in containers %}
      #### Container: `{{ container.name }}`
      ```
      {{ container.resources | jsonify }}
      ```
    {%- endfor %}
  {%- endif %}

  This means you are not being given the highest level of CPU Quality of Service by Kubernetes.

  Please change any requests and limits CPU value to whole integers IE `2` or multiples of 1000 millicores IE `2000m`

  You can read more about k8s [CPU Policies Here](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-management-policies)
{% else %}
  ### Check Passed
{% endif %}

Which results in the following message to your audience when they view the failed check on their service. How Awesome!

Check Failed Message

If you made it this far I leave the creation of the message for Requests and Limits match check to you as homework to flex your new found knowledge and skillset!


Conclusion


Hopefully these examples have armed you with how-to knowledge for validating your Kubernetes cluster with OpsLevel. I hope you are inspired to get out there and start crafting Kubernetes based validation checks for your organization!

If you have any questions don’t hesitate to reach out to info@opslevel.com.

Previous Post: Service Ownership: What It Really Means and How to Achieve It
Next Post: Production Readiness in Depth: A Guide and Checklist

Learn how to grow your microservice architecture without the chaos.

Not ready for a demo? Stay in the loop with our newsletter.