Cloud security metrics: what to track and where to track them
Engineering teams are moving faster than ever, driven by advancements in developer infrastructure, expanded self-service, and AI-assisted workflows. The increase in speed and efficiency also carries with it an increased risk of security incidents–a risk that multiplies with AI code generation. Combined with the evolving risk of data breaches, unauthorized access, and compliance concerns, cloud security is necessarily top of mind for engineering and security teams. Tracking cloud security metrics helps teams monitor risk and continuously adapt to the evolving threat landscape.
In this article, we’ll cover important cloud security metrics, and why and where you should track them.
Vulnerability management metrics
Identifying and addressing security vulnerabilities is critical to lowering the risk of exploitation. Companies utilizing open source technologies have long been familiar with the risks of introducing vulnerabilities through third-party code and outdated libraries. Rapid adoption of AI-generated code is increasing this risk–making vulnerability management a top priority for every engineering organization.
These metrics focus on detecting and managing vulnerabilities across your codebase.
Number of vulnerabilities by level of severity
- Why it matters: Tracking vulnerabilities based on their severity (critical, high, medium, low) helps allocate resources based on prioritization.
- Impact: High – Severity breakdown helps quantify risk and determine where to focus attention.
Mean time to remediate (MTTR) vulnerabilities
- Why it matters: MTTR measures how quickly vulnerabilities are addressed once identified, tracking the lifecycle from discovery to resolution.
- Impact: Critical – Faster remediation dramatically lowers your risk profile and demonstrates security agility.
Code coverage
- Why it matters: Understanding the percentage of code that is covered by security scanning tools helps to limit blind spots and highlights potential risk.
- Impact: High – Visibility of code coverage helps identify where additional tooling or attention is needed.
Incident response and recovery metrics
When incidents do happen, speed is key. How quickly an organization can detect and respond to security incidents directly impacts the extent of damage. Faster detection and response reduce downtime, financial losses, and reputational damage. These metrics measure how fast your team can detect, contain, and recover from threats.
Mean time to detect (MTTD)
- Why it matters: A shorter MTTD means your security team is detecting threats quickly, minimizing potential damage.
- Impact: Critical – The longer a threat goes undetected, the more damage it can cause.
Mean time to respond (MTTR)
- Why it matters: MTTR tracks how long it takes to contain and remediate a security threat once detected.
- Impact: Critical – Rapid containment prevents escalation, and correlates with reduced impact from security incidents.
Number of security incidents
- Why it matters: Tracking the frequency of security incidents helps organizations understand trends, assess risks, and allocate resources accordingly to strengthen defenses.
- Impact: High – Patterns in security incidents reveal vulnerabilities in your security architecture.
Data recovery time
- Why it matters: Monitoring data recovery times ensures business continuity and helps refine backup and recovery strategies. Plus, it’s important for SLA compliance.
- Impact: High – Rapid data recovery reduces downtime and ensures SLA compliance
User behavior anomalies
- Why it matters: Monitoring for unusual login locations, access times, or data downloads can help detect compromised accounts and vulnerabilities.
- Impact: High – Behavioral analytics can detect threats that might bypass traditional security controls.
Threat detection rate
- Why it matters: Threat detection reflects how effective your tools are. The higher the rate, the stronger the detection posture.
- Impact: High – The effectiveness of security tools in identifying threats is a key indicator of your overall security posture.
API security and data security metrics
APIs are often targeted as entry points to sensitive data, which is why tracking activity here protects against data leakage and unauthorized access. Rapid adoption of MCP and AI agents further increases the need for engineering and security teams to collaborate on securing APIs and data.
Data exfiltration attempts
- Why it matters: Detecting unusual data transfers, particularly involving sensitive information, can prevent data breaches.
- Impact: Critical – Data exfiltration is often the ultimate goal of cyberattacks.
Access to sensitive data
- Why it matters: Regularly auditing who accesses personal, financial, or proprietary data helps prevent unauthorized access, ensure compliance and catch misuse.
- Impact: High – Inappropriate access to sensitive data is both a security and compliance risk.
Encryption coverage
- Why it matters: Ensuring sensitive data is encrypted both at rest and in transit protects against unauthorized access.
- Impact: High – Encryption is a last line of defense if other security controls fail.
API call anomalies
- Why it matters: Unusual API activity, such as unexpected spikes in requests or unauthorized access attempts, can indicate an attack in progress.
- Impact: Medium-High – API vulnerabilities are increasingly targeted by attackers.
Authentication and access control metrics
Unauthorized access remains one of the most direct paths to data breaches. Monitoring authentication and access metrics ensures robust access controls and helps detect suspicious behavior early.
Failed login attempts
- Why it matters: Spikes may indicate brute-force attacks, credential stuffing, or unauthorized access attempts. Monitoring this metric helps identify potential security breaches in real-time.
- Impact: High – This can be your first line of defense against credential-based attacks.
Multifactor authentication (MFA) adoption rate
- Why it matters: Enabling MFA is one of the most effective ways to prevent credential-based attacks and significantly reduce the risk of unauthorized access.
- Impact: High – Organizations with high MFA adoption experience 99.9% fewer account compromises.
Privilege escalation attempts
- Why it matters: Privilege escalation attempts indicate potential attempts to gain access to admin functions.
- Impact: High – Privilege escalation is often a key step in sophisticated attacks, and stopping these attempts early can prevent catastrophic breaches.
Number of users with privileged application access
- Why it matters: Keeping this number limited prevents application access sprawl and ensures application access is only given to those who require it.
- Impact: High – Limiting this access to only those who need it is critical.
Session duration and timeout events
- Why it matters: Unusually long or active sessions could be signs of compromised accounts or session hijacking.
- Impact: Medium-High – Abnormal session patterns often help detect stealthy attackers. Note: The patterns could also represent legitimate activity.
Compliance metrics
Percentage of services in alignment with compliance frameworks
- Why it matters: Engineering teams need awareness around relevant compliance mandates like SOC2 and PCI-DSS to ensure they’re staying top of mind across services
- Impact: High – A lack of adherence to important compliance policies could result in fines and other negative impacts to the business.
Percentage of internal security training completion
- Why it matters: Measuring completion of security training helps ensure compliance with existing policies.
- Impact: High – A shared level of awareness around company security policies reduces the potential of an employee knowingly or unknowingly introducing a vulnerability. Security training may also be a compliance requirement for companies in highly regulated environments.
Observability and logging metrics
Comprehensive logging and monitoring provide visibility into application behavior and security events, enabling proactive threat detection and compliance.
Log volume and anomaly detection
- Why it matters: Security logs provide valuable insights into potential threats. Anomalies in log patterns can reveal breaches in progress. Monitoring log volume and analyzing anomalies helps identify suspicious activities before they escalate.
- Impact: High – Logs are often the primary source of evidence for security investigations.
SIEM (Security Information and Event Management) alerts
- Why it matters: SIEM solutions unify and correlate alerts across systems. Tracking alert volume and response times ensures that critical threats are addressed promptly.
- Impact: High – SIEM alerts help prioritize security responses based on threat severity.
Tracking cloud security metrics in an internal developer portal
Organizations use many different tools to maintain cloud security posture. As a result, alerts, activity, and metrics commonly become siloed within individual tools–or centralized in noisy Slack channels that are all too tempting to mute.
An internal developer portal centralizes activity, alerts, and metrics from multiple cloud security tools, giving developers access to important security information in the context of their software ecosystem. This improves collaboration across teams and expedites awareness of critical vulnerabilities and threats.
An internal developer portal enhances visibility and actionability of cloud security metrics in the following ways:
- Consolidate real-time security data from cloud security and application security tools for shared context and faster response times
- Build dashboards for cloud security metrics, including incident response times and number of vulnerabilities by severity across your codebase.
- Automate continuous security checks to maintain compliance and quickly identify any services that deviate from security standards.
- Establish and enforce cloud security best practices like encryption enforcement, secure configuration, and ensuring no secrets make it into your source code.
OpsLevel: The Internal Developer Portal for Secure Development
OpsLevel is an internal developer portal that empowers engineering teams to move faster while staying secure. OpsLevel integrates with 20+ cloud security tools to provide comprehensive visibility into the security of your software ecosystem, enabling organizations to centralize critical security information in a single place.
Using OpsLevel, engineering teams can:
- Integrate directly with cloud security solutions like Prisma Cloud, Aqua Security, Lacework, GitHub Advanced Security, and 20+ other cloud security tools to centralize security alerts and insights.
- Automate checks for cloud security standards like SOC2 compliance and scanning for secrets in source code.
- Clearly define and assess adherence to cloud security best practices through the OpsLevel Service Maturity Rubric, driving consistent quality across services.
- Build custom graphs and dashboards to surface and centralize essential cloud security metrics
See why Hootsuite chose OpsLevel as the developer portal for securing and ensuring compliance across 700+ microservices and 50+ engineering teams. Or, to get an in-depth look at OpsLevel’s security features, set up a demo with a technical expert.