The OpsLevel Ownership Framework

OpsLevel

January 18, 2023

Reach Service Ownership through Service Maturity

Service Ownership is more than just assigning a service to a particular team. True ownership means continuous improvement.

It means empowering developers to be autonomous and accountable stewards of their services, and it needs to be deeply intertwined with a solution like our Service Maturity.

Achieving service ownership is much harder than answering who owns what—but we’ve helped thousands of developers take true ownership of their services over the last few years. Working with our customers, we’ve learned what the ideal path to ownership looks like, and have shaped OpsLevel around it.

Why ownership matters

As software orgs move towards microservice and cloud-native architectures, the number and variety of components used to build and run production systems increases. No top-down mandate or single team can begin to handle all the operational complexity that arises.

OpsLevel’s blend of ownership and maturity gives software engineering teams a framework for managing this complexity, without stifling developers.

When it's easy for developers, who have the most context about their individual services, to be good owners and stewards, engineering orgs can ship high-quality software and drive successful outcomes for their companies. Let’s take a look at how the OpsLevel Ownership Framework came to be, and how your organization can put it into practice.

The need behind the need

OpsLevel was founded on the idea that software teams needed a better solution to track who owned different services, what services did, and other associated service metadata. Spreadsheets and wikis weren’t robust enough and the vast majority of companies couldn’t afford to build custom tooling.

But less than a year into OpsLevel’s journey, we introduced Checks. Why? One of the first things a solid catalog reveals is where you have gaps.

Why doesn’t this service have an owner?
Why doesn’t this Tier 1 service have a Datadog metrics dashboard associated with it?
Why doesn’t this service have a README file in its repo?

It was clear that there was a “need behind the need”. Everyone who wanted to solve service ownership really wanted to understand how their production services were configured and maintained in order to drive change.

Driving change with Service Maturity

The underlying goal of driving change led to the introduction of Service Maturity. While checks themselves have become more powerful and broader in scope over time, reorganizing checks within the Service Maturity rubric has been most impactful.

Flat, unprioritized lists of checks weren’t doing enough to close gaps—and there were lots of gaps! Nothing in software is static, and so truly solving ownership means being able to handle change effectively. Conversations with our customers revealed two key ways we were falling short:

The lack of prioritization was slowing progress. Service owners weren’t sure which checks were most important or which failing checks to address first.
Without clear shared language or context around checks—my service is passing 64% of its checks…is that good?—checks weren’t sticky within organizations.

With the Service Maturity rubric we addressed both these shortcomings:

Checks are bucketed into levels, which provides clear prioritization for a service owners’ tasks.
Levels provide intuitive terminology for describing a service’s status or progress.

The rubric has become the central place for engineering teams to define what good looks like. With clear language and definitions to build on it’s easier to create a culture of continuous improvement and turn developers into proactive stewards of their services.

Steps on the journey to full ownership

Interested in building a culture like this in your engineering organization? Here’s what moving through the four-steps of our Ownership Framework typically looks like for OpsLevel customers.

Catalog

The foundation. Create an inventory of all the services or components in your production software ecosystem, along with associated metadata.

Why

Everything else is built on top of the catalog. It shows your entire organization who owns which services and defines the relationships between them. It also captures key metadata like service tier, who’s on-call, and internal technical documentation.

When a question comes up about one of your services, whether it’s a routine matter or part of an urgent incident response, your catalog should have the answer.

How

We know there’s no one-size fits all approach to bootstrapping a catalog. So mix and match methods, however you need. Connect OpsLevel to your git repos, your Kubernetes clusters, or your deploy pipelines (just to name a few), and we’ll automatically pull information about your services into your catalog.

At Duolingo, creating their catalog was painless. The team imported more than 300 services—representing 97% of their architecture—in under 10 minutes. If you’re not sure which approach is right for you, our solutions, success, and support teams can strategize with you.

Connect OpsLevel to multiple git forges and inventory all your repos and all the linked services

Challenges

There are decisions to make. Will catalog creation be a decentralized project with service owners across your org cataloging services? Managing service definitions via YAML files is a great fit then! Or will it be a centrally managed effort primarily driven by an SRE, DevOps, or Platform team? In that case, consider our Terraform provider.

Measure

Evaluate the state of your services to determine if they are meeting baseline requirements for reliability, security, documentation, or any domain you’re concerned with.

Why

You can’t improve what you don’t measure!

How

Use OpsLevel's Checks to determine your current status. Initially, these checks can be about exploring and learning instead of governance or enforcement. Through this exploration, you’ll surface appropriate standards for your services and begin to separate must-haves from nice-to-haves.

Importantly, the rubric isn’t set in stone. You’ll be able to introduce new checks and raise your quality bar over time.

Find needles in haystacks and easily evaluate granular code or config values with glob and regex pattern support

Challenges

It’s tempting to do everything at once and implement all the best practices you’ve always wished your teams were following. Our support and success teams will help you resist that urge and walk before you run.

In the discovery process, you may find some best practices that you care about don't have obvious or easily accessible data sources to validate against. With tools like our CLI and our custom event check, we can help you close those gaps.

Improve

Codify the configurations, best practices, and standards that are most important for your organization's ability to continuously deliver high-quality software. Prioritize and map out your definitions of good, better, and best. Drive adoption.

Drive change on high profile initiatives with time-boxed Campaigns that include the ability to send targeted reminders via email or Slack

Why

Modern software that isn’t continuously improving will eventually deteriorate, lose ground to competition, or both. It’s essential to stay on top of both the inputs (is this open source package still supported and secure?) and outputs (does the current architecture scale with increased traffic?) of your software in order to maintain security, reliability, and usability.

How

In order to improve software quality across many microservices, you first need to have a well-maintained catalog and clear measurement standards in place. From there, there are many levers to pull:

Run a campaign. Bring extra attention to specific Checks via deadlines and targeted outreach.
Make sure all the Checks you create have complete context around why they matter. You can also use check notes and results messages to give detailed guidance to service owners on how to pass the Check.
Incorporate incentives. Use service maturity levels within OKRs or use OpsLevel within your deploy pipelines to create “express lanes” for mature services.

Challenges

Resources are limited, so there are trade offs to consider. There’s always more on the product roadmap than can feasibly get done in a quarter. Creating a strong Service Ownership culture that asks developers to become stewards of their services from IDE to prod will make roadmap planning exercises harder, especially in the short term.

The tradeoffs will never disappear, but in the long term embracing Service Ownership and Maturity will make them more predictable and manageable. Planning gets easier. The team is shipping more consistently. The roadmap isn’t so intimidating.

Automate

Automate foundational or rote workflows to ease the burden on developers and platform teams. Consistently apply your best practices across the organization.

Why

Movements like Shift Left and DevSecOps are asking developers to do more than code new products and features. They now need to be skilled in domains across the software lifecycle: testing, security, operations, and observability.

Rather than expecting them to learn these new skills from the ground up, organizations should make it easy to do the right thing. And that should not always mean opening a ticket with the platform team.

Even when developers do have the necessary skills, automation reduces the risk or errors and streamlines processes so they are quickly auditable.

How

With OpsLevel, automation begins at the foundation. Our catalog keeps service metadata up to date and runs checks automatically, so everyone in your organization can spend time on higher value activities.

And with Service Creation from templates and Custom Actions, platform teams can set up developers for success by turning complex or tedious workflows into one-click actions.

Provide Custom Actions to service owners and make recurring day 2 operational tasks simpler for everyone

Challenges

There are normal growing pains around automation. Initial setup costs have a payback period. But some organizations are especially hesitant to automate. They struggle with giving up full control over every step of a process. The underlying issue can be summed up by the maxim, “exceptions are the enemy of automation.”

Automation is intimidating because their tech stack is too varied, similar teams operate very differently, or processes are too ad-hoc. Automation will always be a struggle when edge-cases are the norm.

Fortunately, organizations that adopt OpsLevel and invest in Service Maturity naturally evolve to be more consistent. When most teams and services are handling the basics in the same manner, it’s logical to increase automation.

Get started today

The OpsLevel Ownership Framework provides a comprehensive approach for managing and de-risking operational complexity, empowering developers to truly take ownership of their services. Modern software teams rely on OpsLevel to turn their developers into service owners.

It's become that crucial for how we manage our microservices. We can’t go back to life without it. – Senior Engineering Manager, Platform Operations at Podium

If you’re ready to do the same, keep our four-step Ownership Framework in mind.

‍Catalog—get visibility. Discover all services in your architecture. Assign owners, break down silos, & lay the foundations for ownership
Measure—get clarity. Determine the current state of your services. Map out your good-better-best engineering standards.
Improve—get consistency. Make best practices routine with our guided maturity rubric.
Automate—ship faster and safer. Reduce the burden of ownership and streamline workflows.

Not yet an OpsLevel customer? Request access to OpsLevel here.