OpsLevel Logo
Product
Developer portal
Software catalog
Understand your entire architecture at a glance
Standards
Your guide to safe, reliable software
Developer self-service
Empower developers to move faster, risk-free
Integrations
Connect your most powerful third-party tools
Use Cases
Ownership
Build accountability and clarity into your catalog
Standardization
Set and rollout best practices for your software
Developer Experience
Free up your team to focus on high-impact work
Customers
Resources
All Resources
Our full library of resources
Pricing
Flexible and designed for your unique needs
Podcast
Podcast
Conversations with technical leaders
Blog
Blog
DevOps resources, tips, and best practices
Demo
Demo
Videos of our product and features
Tech talk
Tech talk
Our POV on technical topics
Guide
Guide
Practical resources to roll out new programs and features
DocsLog In
Talk to usTry for free
No items found.
Share this
Table of contents
 
Resources
Blog

How to level up your team’s production readiness with automation

Product
Standardization
Automation
Platform engineer
Engineering leadership
SRE
Rubric
Checks
How to level up your team’s production readiness with automation
Kenneth Rose
|
June 9, 2023
How to level up your team’s production readiness with automation

As software engineers, we can all agree that there’s no such thing as perfect software. Whether we like it or not, there’s always something that can go wrong. Rather than strive for perfection, engineering teams should instead do everything they can to minimize potential disruptions by proactively addressing highly occurring preventable causes. This is where production readiness comes in. 

Production readiness helps engineering teams answer whether their production services meet the operational standards that matter to their organization. Production readiness reviews and checklists therefore measure readiness across a number of categories, including reliability, security, observability, quality, maintainability, and more. Getting it right means that your team can avoid incidents that lead to rework, cause revenue loss and reputational damage, and even impact developer velocity and morale. 

For teams that are in the early stages of leveraging production readiness, this approach often relies on manual checklists and reviews. While that is a good starting point, a mature production readiness process should be streamlined, comprehensive, and continuous. Let’s explore how you can take your production readiness to the next level by leveraging automation. 

The three challenges that get in the way of production readiness

When engineering leaders start exploring opportunities for enhancing their production readiness capabilities, there are often three challenges that they come up against. 

  • A discoverability problem. Teams don’t have complete or up-to-date visibility into all the services and software that exists within their organization. There’s also a lack of clarity around ownership and accountability.
  • A measurement problem. There isn’t a clear understanding of what needs to be measured in order to ensure readiness across multiple categories, nor of the metrics required. 
  • A cultural problem. Developers aren’t interested in trading product development time for ownership and production readiness tasks, nor are they motivated to do so. 
Production readiness often comes with three problems: discoverability, measurability, and cultural adoption.

To build a truly effective production readiness model, you need to address all three of these problems. At the end of the day, you can’t improve the things you don’t know exist or the things you can’t measure. And you definitely won’t get very far improving things that nobody cares about or has time for. 

The solutions for each of these challenges hinge on introducing automation into your approach to production readiness. 

How to solve production readiness challenges

While automation is a key driver in addressing each of these three challenges, it’s not a silver bullet. Setting your organization up for success will take time—but it’s important to remember that the investment will be worth it in the long run.

Solving the discovery problem

It’s hard to know what services need improvement if you don’t fully know what services you have in the first place. For production readiness to be effective, you need to know what all your services are, where they are, and who owns them.

Many teams will rely on spreadsheets and Notion pages to manually catalog all of their services. While this may work for small teams, they can quickly become incomplete and out of date once a team scales. 

An automated service catalog can provide real-time visibility into each service, with easily accessible information and metadata including ownership, past changes and deploys, where the code lives, where the service lives in the tool chain, dependencies, and more. It integrates with different systems (e.g., Kubernetes), automatically pulling in all deployment information and more, so developers don’t have to make manual updates. 

Solving the measurement problem

When it comes to identifying the components you want to measure as part of your production readiness, you should start by creating a list of all the things that are important to your organization. An early iteration of this might look like a rudimentary checklist that service owners have to review before taking anything to production—but that’s not going to scale well. 

For instance:

  • Is an owner defined?
  • Are backups setup? Stored cross-region? 
  • Is data encrypted at rest? 
  • Is data encrypted in transit? 
  • Are secrets stored in Vault? 
  • Are logs emitted to ELK?
  • Does the service store PII? 
  • Is it on the latest version of $Framework?
  • Are instances running in prod VPC? Using the right security group? 
  • Is container scanning enabled? 

There are two core challenges that many teams face with this approach. The first is data collection. There’s a lot of manual effort that goes into collecting the data required for a production readiness checklist. Service owners have to investigate all those elements, and answers may differ from one person to the next, making it somewhat unreliable. 

The other challenge is the evaluation process. Today, production readiness happens mostly right before a feature or application goes to production as a single large task. This means that when a new check is introduced or a dependency changes, the production readiness of a service that’s already in production isn’t reassessed, and that can open the door to risk. 

The solution here is—once again—automation. With an automated check system that integrates with detection tools and tracks all the measurements you care about, you can bypass potential errors. In other words, the goal here is to have a measuring system that checks your sources of truth rather than asking a human that might not actually know the answer. 

Automation also allows for continuous evaluation, keeping production readiness as a steady burn, rather than a high-intensity, one-time effort. This means everything is always being monitored in real time against the most complete and relevant production readiness checklist. In turn, this reduces friction for developers and makes production readiness efforts more efficient and effective. 

Checks in OpsLevel show service owners how their services rank based on which standards they meet.

At OpsLevel, we’ve introduced another evolution to the production readiness checklist by taking a graduated approach. Rather than having a flat checklist where each item is weighted the same, we recommend partitioning your checklist into multiple (typically at least three) production levels or grades. As you can see in the image above, our customers often use Bronze, Silver, and Gold rankings. The Bronze level defines the minimum threshold a service needs to meet, Silver is the core level that has baseline requirements for when a service has cleared the bare minimum, and Gold is for future-proofing and aspirational standards that will be critical in the future, but are less of a priority now. 

In this approach, each service is given a “grade” depending on the checks they have completed. If a service has all their Bronze checks complete and only some of the Silver checks done, then it’s given a Bronze grade. This makes it easier to compare maturity against other services while also ensuring that the must-have checks are prioritized. Plus, it makes it more manageable for service owners to have different milestones to hit versus only focusing on hitting the 100% mark. 

If you currently have a long flat checklist for production readiness and want to evolve to this model, your organization will have to go through a pretty rigorous prioritization process to figure out where each check lives. Another important thing to consider is the natural sequence of the checks. For example, you wouldn’t implement canary deploys before you had reliable rollbacks in place, so that sequencing can help you decide which checks need to happen first.

This approach is also helpful from a visibility perspective. For service owners, they can quickly check in on any of their services to see its service maturity ranking and identify what still needs to be done. At a higher level, leaders can also check how the various services in their purview are doing from a maturity perspective and where the gaps are. 

Solving the cultural problem

In order to develop a successful production readiness system, you need it to be embedded into your organization’s culture—but this isn’t something that’s going to happen overnight. Building a culture that prioritizes production readiness is a multi-step process that will take time. 

  • Step 1: Start at the top. Having buy in from your leadership will be a key driver in moving things forward and encouraging adoption throughout the rest of the organization. 
  • Step 2: Implement ruthless prioritization. This is the time to make really hard decisions. What trade offs will your team make in terms of feature development to implement production readiness work? Having the executives from step 1 on board will be helpful in these discussions as they will be able to rule on disagreements and advocate for changes that need to happen.
  • Step 3: Incentivize teams to do the work. Giving teams the right data and tools is a great starting point, but it’s not enough. Developers need to feel like they are collectively contributing towards organizational objectives. 

When it comes to incentivizing team members, we’ve seen our customers do this successfully in three ways. The first focuses on embedding production readiness into top-down goals. In practice, this can look like adding service maturity into your regular goal- and objective-setting cycle. For instance, you could have OKRs that are tied specifically to production readiness, as well as team and manager performance metrics tied to service levels. In addition, failing checks and lagging services could be added to the agenda for operational reviews. From a leadership perspective, there also needs to be complete visibility into performance against these goals and objectives through an automated reporting process. 

The second approach we’ve seen our customers use is reserving capacity exclusively for production readiness. This means carving out dedicated time or resources within each team for ownership work. This could be 20% of the points in a sprint, one team member per sprint, or even having every fourth or fifth sprint dedicated to ownership tasks.

Lastly, you can also integrate service maturity into the software development lifecycle. Automating production readiness can be a stepping stone towards continuous development, and we’ve seen customers integrate our service maturity functions in their CI/CD pipeline. 

A diagram shows how OpsLevel customers integration our service maturity features into their CI/CD pipeline.

The approach or approaches that you choose to implement will depend largely on your organization’s culture and way of doing things. Regardless, investing in automation will be key to reducing friction, making the trade-offs easier to negotiate, and ultimately making it easier for dev teams to do this work. 

Achieving maturity in production readiness requires automation

Today’s engineering teams are being asked to be quicker, more agile, and more efficient than ever before. Often, this means that seemingly “non-essential” tasks like production readiness can be quickly deprioritized, leaving the organization open to increased risk. Leveraging automated processes—and embedding them within a culture of continuous improvement and alignment within the organization—can help teams stay agile and focused on the product while simultaneously prioritizing reliability, security, maintainability, and more. To us, that’s the ideal balance for engineering teams everywhere.

Want to learn more about what a great IDP looks like in practice? Our co-founders John and Ken recently shared their learnings in one of our webinars. You can watch the recording here. 

‍

More resources

Blog
September 19, 2023
by
Fernando Villalba
The OpsLevel Developer Experience (DevEx) series. Part 1: What is DevEx?

Great developer experience (DevEx) is what you get when developers can easily achieve and maintain flow state at work. This article begins a series where we tackle all of the areas that affect flow state and impair your developer experience at your company and provide example metrics and suggestion to help you operate like a potential future unicorn.

Blog
August 31, 2023
by
OpsLevel
August 2023 release notes

This month included an update to our Service Maturity features—to give you even more flexibility—plus more sorting and syncing improvements. Read on to learn more!

Blog
May 31, 2023
by
Haley Hnatiw
May 2023 release notes

See what we’ve shipped in the month of May.

OpsLevel Logo
Subscribe
Join our newsletter to stay up to date on features and releases.
By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.
SOC 2AICPA SOC
Product
Software CatalogMaturityIntegrationsSelf-serviceRequest a demo
Company
About usCareersContact UsCustomersPartnersSecurity
Resources
Docs
Blog
Demo
© 1999 J/K Labs Inc. All rights reserved.
Cookie Preferences
Terms of Use
Privacy Policy
Responsible Disclosure
By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Data Processing Agreement for more information.
Okay!