How Podium Uses OpsLevel to Keep Libraries Up-to-Date, Banish “Bad Guys,” and Better Standardize Microservices
A sprawling, diverse microservices architecture that was increasingly difficult to maintain
An intuitive, automated production readiness solution, complete with guardrails and gamified levels
Up-to-date libraries, no gatekeepers, and better standardization across the board
Case Study Summary
Learn why Podium's Platform Engineering team can't go back to life without OpsLevel.
Podium is a solution that helps local businesses manage customer communications, improve the customer experience and enable companies around the world to garner better online reviews.
Kelvin, Senior Engineering Manager of Site Reliability, Scott, Senior Site Reliability Engineer, and Andrew, Software Architect, are responsible for managing microservice quality for the 1,200-person company.
The team's focus is the reliability of the Podium platform. This ensures local businesses—and their customers—always have the best experience when using Podium. Friction at any point in the customer experience is in direct conflict with Podium’s mission.
“Our mission is to make sure everything always works, that the platform is healthy and strong, that our teams are working together to make sure everything is as reliable as possible,” said Kelvin.
Although Podium began with a Rails monolithic application, they’ve moved to a microservices model over the past five years. The microservices architecture serves Podium well, but it also presented some challenges. Over time, the company amassed hundreds of microservices, which became difficult to maintain because the addition of each microservice added more complexity.
The Podium team decided to try OpsLevel, as they saw the potential for the platform to bring consistency and reliability to their microservice architecture and remove manual work that no one enjoyed.
Microservice sprawl and no system to coordinate reliability and maintenance
As Podium grew, it accumulated more and more microservices, which became difficult to monitor, manage, and maintain. Ultimately, the team was dealing with “microservice sprawl.”
To get control over the sprawl of services and make sure they were in compliance with company standards, Kelvin created a production readiness checklist within the company Wiki. This checklist included “hundreds of items” that team members needed to complete before creating a new service. “The checklist was my way of pumping the brakes so that each microservice deployed was reliable and compliant,” said Kelvin.
Although this manual checklist did its job, it made Kelvin the “bad guy,” as he was the roadblock to getting many services out the door. It also didn’t give much insight into the services that already existed.
The team also had challenges with ownership. “Part of our sprawl issue was a lack of ownership,” said Andrew. “We had microservices that hadn’t been touched in years that didn’t have a clear owner, so it was difficult to get questions answered about those services or resources dedicated to them.”
An intuitive product complete with guardrails and gamified levels
When the team began their initial hunt for a solution, they knew they needed something beyond a service catalog. “What we needed was something that could help us manage our services so that they’d always be up to date,” said Kelvin.
The team began exploring OpsLevel and loved how intuitive the UI was, as they’d also looked into a competitive product that was difficult for engineers to understand. When the engineers tried out OpsLevel, there was no friction, and they understood it right away. “OpsLevel’s UI and the paradigms and UX that are used are very much a pit of success compared to other tools we tried,” said Kelvin.
Scott noted that developers were given access to OpsLevel without any guidance or support and still found it extremely easy to use, which was a huge selling point. They also liked that it was possible to define checks in Terraform so that they could store and manage their best practices as code.
Additionally, the idea of service maturity was extremely attractive, especially since maintaining service quality was so important to the team. “We knew we wanted to track quality and OpsLevel offered a gamified experience using levels,” said Andrew. “Using OpsLevel, each microservice would be tagged as gold, silver, bronze, or ‘not production ready.’”
“The structure of OpsLevel really resonated with us. The gamified levels– gold, silver, and bronze– was the perfect way to categorize our services.” – Andrew, Software Architect
Once the team signed on with OpsLevel, they realized that the service maturity rubric and reporting capabilities made it extremely easy to present the health of their services to the CTO, who’s very on board with OpsLevel’s services. “Every six weeks, as part of a larger meeting, teams go to our CTO and show him their OpsLevel dashboards, which is amazing,” said Kelvin. “Because the CTO has insight into the health of microservices and ownership, he then encourages the team to maintain service quality and enable services to achieve a higher standard of excellence.”
“OpsLevel’s team– both in sales and continuing support– has been amazing. I've worked with a lot of vendors and OpsLevel has set a very high bar for customer interaction. The team is extremely on top of it.” – Scott, Senior Site Reliability Engineer
Up-to-date libraries, no bad guys, and better standardization across the board
Thanks to Podium’s partnership with OpsLevel, standardization is easier than ever before, which was a top goal for the Podium team.
This standardization helps the team be high leverage and efficient because automation becomes easier when everything looks the same and there are fewer edge cases. This makes it so that learnings and best practices are more easily rolled out and applied when every service is getting the same "lesson."
Additionally, developers are more interchangeable between teams, which is a huge win for the business in terms of flexibility and the ability to grow without hiring or retraining.
The team is now able to ensure their libraries are up-to-date while maintaining high standards. Additionally, conversations between teams are easier. Kelvin doesn’t have to be the bad guy– there is now a scalable system of checks in place to ensure that services are maintained without his involvement.
“You can put this on a billboard: OpsLevel has helped us manage our microservice sprawl. I'm a lot less concerned about reliability than I was before. It’s hands-down one of my favorite tools.” – Kelvin, Senior Engineering Manager of Site Reliability
Libraries are always up-to-date
For Podium, it’s crucial that libraries—both those they write or those with third-party dependencies—are kept up to date. “Ensuring that libraries are up to date across 100+ microservices was very inefficient,” said Scott. “You can track tickets, but that is a slow, manual, inefficient process.”
Now that Podium has OpsLevel, they have a single place to see whether the libraries are up to date. For example, they use Elixir as a primary language and can now see, at a glance, that the Elixir versions are up to date.
If the services are at risk of becoming outdated, it’s now obvious that they need to be maintained. And, because OpsLevel requires ownership, the team knows who to talk to about maintaining libraries at the right level.
Checks are set in stone with no “bad guys”
Now that Podium is using OpsLevel for checks, it’s much easier to have conversations with team members about maintaining microservices. The tool is now the gatekeeper—Kelvin, Scott, and Andrew are focused on enabling rather than acting as “gatekeeping bad guys.”
“With checks set in stone in OpsLevel, we're having fewer hard conversations,” said Andrew. “Instead, we are proactive. We're talking about roadmap items and future plans and ways that we can help our teams accomplish their goals.”
Since Podium started categorizing the maturity of its microservices based on graduated levels, they’ve seen an increase in standardization. Teams know that if their services pass the bronze and silver levels, they’re moving in the right direction and helping to meet company goals. For example, they now have 70+ services that are instrumented for telemetry in the same way. Using and analyzing the data in observability tools is now much easier.
Since that baseline is nearly ubiquitous, the team is working on more ambitious projects: “Over the past year, we've finally gotten our whole department on fully automated SLOs and SLI tracking—OpsLevel has helped to drive that.”
“If we didn’t have OpsLevel, we’d need to build or find another OpsLevel! It’s become that crucial for how we manage our microservices. We can’t go back to life without it.” – Kelvin, Senior Engineering Manager of Site Reliability
Subscribe for regular updates.
Conversations with technical leaders delivered right to your inbox.
By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Data Processing Agreement for more information.