[Engineering Org Playbook] How to Build an Engineering Metrics System the Business Actually Cares About (Part 2/3)

4 minute read

How to Build an Engineering Metrics System the Business Actually Cares About

This is the second of three posts based on a consulting engagement I did for a LATAM payments fintech. Names, data, and details have been changed. In the previous post I did the organizational diagnosis. Here’s what comes next: how to build the metrics system that sustains that diagnosis over time.

A diagnosis is a snapshot. Metrics are the video. Without a system that measures continuously, the diagnosis becomes obsolete in weeks.

Key terms:

Velocity: story points completed per sprint

Sprint Completion Rate: percentage of committed tickets that were completed

Carry-over rate: percentage of tickets that roll into the next sprint without being completed

Deployment Frequency: number of deploys per week per service

Lead Time: time from commit to production

Weekly KPIs and formulas

Speed

Cycle Time P50/P75 = percentile of (done date - start date) per squad
Velocity = story points completed per sprint
Sprint Completion Rate = tickets completed / tickets committed × 100
Carry-over rate = tickets rolled to next sprint / total committed × 100

Quality

CFR = deploys with incident / total deploys × 100
MTTR = average (incident resolved - incident detected) in hours

Flow

Flow Efficiency = active time / (active + wait time) × 100
- Active time = time in “In Progress” or “Coding” status
- Wait time = time in “Blocked”, “In Review”, “Waiting for deploy” statuses
Deployment Frequency = deploys / week / service

Sprint composition

% Bugs = bug tickets / total tickets × 100
% Features = feature tickets / total tickets × 100
% Tech Debt = tech debt tickets / total tickets × 100
% Interruptions = tickets added after sprint start / total tickets × 100
Tickets in progress = started but not completed tickets
Blocked tickets = tickets with active impediment

Business

Revenue at Risk per squad = sum of (Deploys × CFR) × (GMV/week) × Severity × (MTTR/168)
Engineering Leverage = revenue generated / engineering cost
Acceptance rate = accepted transactions / total × 100

Technical health

Uptime / Availability = percentage of time the service is operational
API Latency = average response time (P50/P95)
Transaction error rate = failed transactions / total × 100

Alerts for the CTO

Not every metric needs immediate attention. The key is defining clear thresholds so the CTO knows when to act and when to just observe.

Red (immediate action)

CFR > 15%
MTTR > 4 hours
Sprint Completion Rate < 50%
% Interruptions > 30%

Yellow (weekly review)

Cycle Time P75 > 12 days
Flow Efficiency < 25%
Engineering Leverage < 1.5x
Deployment Frequency < 1/week
% Bugs > 25%

Connecting engineering metrics to business outcomes

This is the table every engineering leader should be able to present to their CTO. If you can’t explain how what you measure in engineering affects the business, the metrics won’t gain traction.

Engineering metric	Business impact
CFR + MTTR	Revenue at Risk
Cycle Time	Time to market, client agreement compliance
Deployment Frequency	Feature delivery speed
Flow Efficiency	Productive time vs wasted time
Sprint Completion	Delivery predictability
Engineering Leverage	Team ROI
Uptime / Availability	Lost transactions
API Latency	Client experience, abandonment
Error rate	Failed transactions
Acceptance rate	Business logic working correctly

Dashboard structure

Each audience needs to see different things. A dashboard that works for the EM doesn’t work for the CTO, and vice versa.

Executive View (CTO — weekly)

Revenue per squad
Revenue at Risk per squad
Engineering Leverage
Active red and yellow alerts

Squad View (EM — daily)

Cycle time trend
Flow efficiency
Sprint Completion Rate
Sprint composition (features vs bugs vs debt vs interruptions)

Service View (EM — daily)

Deployment Frequency
Lead Time
CFR
MTTR

Product View (PM — daily)

These metrics come from the product team, not from the engineering diagnosis. But they’re what complete the picture.

Transactions processed
Accepted vs rejected transactions
Conversion rate
GMV processed

Ownership

EM updates squad and service metrics weekly
CTO reviews executive view in the weekly meeting

Pragmatic implementation

Don’t start with sophisticated tooling. The temptation to build a beautiful dashboard from day one is real, but what matters is validating that the metrics you chose actually tell the right story.

Week 1-2: Manual spreadsheet to validate that the metrics make sense and that the data is accessible
Week 3-4: Script that pulls from APIs (Jira/Linear, GitHub, PagerDuty) into Google Sheets
Month 2+: Dashboard in Metabase or Looker if the volume justifies it

Data sources

Jira/Linear — cycle time, velocity, sprint completion, sprint composition
GitHub — deployment frequency, lead time, PRs
PagerDuty/Datadog — CFR, MTTR, incidents, uptime, latency
Manual tracking — flow efficiency (until you have tooling to automate it)
Database / Analytics — transactions, GMV, conversion, error and acceptance rates

This is a first approach with limited context. Once you’re inside the organization studying the teams and the real data, you can adjust which metrics to prioritize and how to present them. But as a starting point, this system already gives you visibility.

Metrics aren’t the goal — they’re the tool for having better conversations. When the CTO can see at a glance which squads are at risk and why, decisions get made faster and with better information.

Next post: the 90-day execution plan that puts all of this into motion.

If you want to dig deeper into the metrics I used as reference, check out the DORA framework.

Twitter Facebook LinkedIn