How to Build an Engineering Metrics System the Business Actually Cares About (Part 2/3)

4 minute read

How to Build an Engineering Metrics System the Business Actually Cares About


This is the second of three posts based on a consulting engagement I did for a LATAM payments fintech. Names, data, and details have been changed. In the previous post I did the organizational diagnosis. Here’s what comes next: how to build the metrics system that sustains that diagnosis over time.

A diagnosis is a snapshot. Metrics are the video. Without a system that measures continuously, the diagnosis becomes obsolete in weeks.

Key terms:

  • Velocity: story points completed per sprint
  • Sprint Completion Rate: percentage of committed tickets that were completed
  • Carry-over rate: percentage of tickets that roll into the next sprint without being completed
  • Deployment Frequency: number of deploys per week per service
  • Lead Time: time from commit to production

Weekly KPIs and formulas

Speed

  • Cycle Time P50/P75 = percentile of (done date - start date) per squad
  • Velocity = story points completed per sprint
  • Sprint Completion Rate = tickets completed / tickets committed Ă— 100
  • Carry-over rate = tickets rolled to next sprint / total committed Ă— 100

Quality

  • CFR = deploys with incident / total deploys Ă— 100
  • MTTR = average (incident resolved - incident detected) in hours

Flow

  • Flow Efficiency = active time / (active + wait time) Ă— 100
    • Active time = time in “In Progress” or “Coding” status
    • Wait time = time in “Blocked”, “In Review”, “Waiting for deploy” statuses
  • Deployment Frequency = deploys / week / service

Sprint composition

  • % Bugs = bug tickets / total tickets Ă— 100
  • % Features = feature tickets / total tickets Ă— 100
  • % Tech Debt = tech debt tickets / total tickets Ă— 100
  • % Interruptions = tickets added after sprint start / total tickets Ă— 100
  • Tickets in progress = started but not completed tickets
  • Blocked tickets = tickets with active impediment

Business

  • Revenue at Risk per squad = sum of (Deploys Ă— CFR) Ă— (GMV/week) Ă— Severity Ă— (MTTR/168)
  • Engineering Leverage = revenue generated / engineering cost
  • Acceptance rate = accepted transactions / total Ă— 100

Technical health

  • Uptime / Availability = percentage of time the service is operational
  • API Latency = average response time (P50/P95)
  • Transaction error rate = failed transactions / total Ă— 100

Alerts for the CTO

Not every metric needs immediate attention. The key is defining clear thresholds so the CTO knows when to act and when to just observe.

Red (immediate action)

  • CFR > 15%
  • MTTR > 4 hours
  • Sprint Completion Rate < 50%
  • % Interruptions > 30%

Yellow (weekly review)

  • Cycle Time P75 > 12 days
  • Flow Efficiency < 25%
  • Engineering Leverage < 1.5x
  • Deployment Frequency < 1/week
  • % Bugs > 25%

Connecting engineering metrics to business outcomes

This is the table every engineering leader should be able to present to their CTO. If you can’t explain how what you measure in engineering affects the business, the metrics won’t gain traction.

Engineering metric Business impact
CFR + MTTR Revenue at Risk
Cycle Time Time to market, client agreement compliance
Deployment Frequency Feature delivery speed
Flow Efficiency Productive time vs wasted time
Sprint Completion Delivery predictability
Engineering Leverage Team ROI
Uptime / Availability Lost transactions
API Latency Client experience, abandonment
Error rate Failed transactions
Acceptance rate Business logic working correctly

Dashboard structure

Each audience needs to see different things. A dashboard that works for the EM doesn’t work for the CTO, and vice versa.

Executive View (CTO — weekly)

  • Revenue per squad
  • Revenue at Risk per squad
  • Engineering Leverage
  • Active red and yellow alerts

Squad View (EM — daily)

  • Cycle time trend
  • Flow efficiency
  • Sprint Completion Rate
  • Sprint composition (features vs bugs vs debt vs interruptions)

Service View (EM — daily)

  • Deployment Frequency
  • Lead Time
  • CFR
  • MTTR

Product View (PM — daily)

These metrics come from the product team, not from the engineering diagnosis. But they’re what complete the picture.

  • Transactions processed
  • Accepted vs rejected transactions
  • Conversion rate
  • GMV processed

Ownership

  • EM updates squad and service metrics weekly
  • CTO reviews executive view in the weekly meeting

Pragmatic implementation

Don’t start with sophisticated tooling. The temptation to build a beautiful dashboard from day one is real, but what matters is validating that the metrics you chose actually tell the right story.

  • Week 1-2: Manual spreadsheet to validate that the metrics make sense and that the data is accessible
  • Week 3-4: Script that pulls from APIs (Jira/Linear, GitHub, PagerDuty) into Google Sheets
  • Month 2+: Dashboard in Metabase or Looker if the volume justifies it

Data sources

  • Jira/Linear — cycle time, velocity, sprint completion, sprint composition
  • GitHub — deployment frequency, lead time, PRs
  • PagerDuty/Datadog — CFR, MTTR, incidents, uptime, latency
  • Manual tracking — flow efficiency (until you have tooling to automate it)
  • Database / Analytics — transactions, GMV, conversion, error and acceptance rates

This is a first approach with limited context. Once you’re inside the organization studying the teams and the real data, you can adjust which metrics to prioritize and how to present them. But as a starting point, this system already gives you visibility.

Metrics aren’t the goal — they’re the tool for having better conversations. When the CTO can see at a glance which squads are at risk and why, decisions get made faster and with better information.

Next post: the 90-day execution plan that puts all of this into motion.

If you want to dig deeper into the metrics I used as reference, check out the DORA framework.