Goodhart's & Gilb's Laws: The Science of Software Metrics

Every engineering organization wrestles with the same fundamental challenge: How do you measure what matters without destroying what you're trying to improve?

This question sits at the heart of modern software development. We need metrics to understand our systems, track progress, and make informed decisions. Yet poorly implemented metrics can create perverse incentives that actually harm the very outcomes we're trying to achieve.

Two powerful laws provide the framework for navigating this challenge: Goodhart's Law and Gilb's Law. Understanding these principles will transform how you approach metrics in your engineering organization, helping you build monitoring systems that drive genuine improvement rather than gaming behaviors.

Goodhart's Law: When Metrics Become Targets

"When a measure becomes a target, it ceases to be a good measure."

This is perhaps the most important principle in organizational measurement. Goodhart's Law reveals why so many well-intentioned metric programs backfire spectacularly in engineering teams.

The Mechanics of Metric Corruption

The moment you make a metric a target for reward or evaluation, human behavior adapts to optimize for that specific number—often at the expense of the actual desired outcome.

Real-world examples:

Lines of code as productivity measure → Verbose, bloated codebases
Story points for velocity tracking → Inflated estimates and artificially small tasks
Code coverage targets → Meaningless tests that inflate percentages without improving quality
Bug closure rates → Tickets marked "resolved" without actually fixing underlying issues

Why Gaming Always Wins

There are three fundamental reasons why Goodhart's Law is so pervasive:

1. The Path of Least Resistance

It's almost always easier to manipulate a proxy metric than achieve the real goal it represents. Hitting a numeric target becomes the primary job, displacing the complex work of delivering genuine value.

2. Malicious Compliance

When pressured to hit specific numbers, people comply with the letter of the law while violating its spirit. They'll deploy code that meets technical specifications but fails to solve user problems.

3. Output vs. Outcome Confusion

Simple, countable metrics (outputs) like commits or deployments are easy to measure and target. The real goals (outcomes) like improved user satisfaction or reduced system downtime are harder to quantify, so organizations default to measuring what's convenient rather than what's valuable.

Defending Against Goodhart's Law

Use Process Metrics, Not People Metrics

Focus on understanding system health rather than evaluating individual performance:

yaml

# Good: System Health Metrics
Cycle Time: "How long from idea to production?"
Deployment Success Rate: "What percentage of deployments succeed?"
Mean Time to Recovery: "How quickly do we resolve incidents?"

# Avoid: Individual Performance Metrics
Developer Lines of Code: "How productive is Sarah?"
Commits per Day: "Is John working hard enough?"
Tickets Closed: "Who's the most efficient?"

Build Balanced Dashboards

Never rely on a single metric. Create dashboards with counter-balancing measures that make gaming difficult:

DORA Metrics (Excellent Example):

Deployment Frequency (Speed) ↔ Change Fail Rate (Quality)
Lead Time for Changes (Efficiency) ↔ Mean Time to Recovery (Reliability)

Tie Targets to Business Outcomes

When you must set targets, connect them directly to customer or business value:

diff

- Target: "Ship 10 features this quarter"
+ Target: "Reduce customer churn by 5% through improved UX"

- Target: "Achieve 90% test coverage"
+ Target: "Reduce production bugs by 50% while maintaining deployment velocity"

Gilb's Law: The Measurement Imperative

"Anything you need to quantify can be measured in some way that is superior to not measuring it at all."

Gilb's Law directly challenges the common excuse: "But that's too subjective to measure!" It argues that an imperfect metric is vastly superior to no metric at all.

Why Measurement Matters

Vagueness Prevents Action

Without measurement, goals remain wishful thinking:

"Improve user experience" → Vague wish
"Reduce core workflow clicks from 7 to 4" → Actionable goal

Measurement Forces Clarity

Attempting to measure fuzzy concepts like "code quality" forces definition. Is it test coverage? Cyclomatic complexity? Bug rates? This definitional process creates shared team understanding.

Progress Over Perfection

Teams get stuck in analysis paralysis, endlessly debating metric flaws. Gilb's Law encourages pragmatism: start with "good enough" now, learn and iterate.

Practical Implementation Strategies

Start with Proxy Metrics

For seemingly unmeasurable qualities, find reasonable proxies:

yaml

Developer Morale:
  Proxies:
    - Voluntary team event attendance
    - Weekly happiness poll responses
    - Internal tool usage rates
    - Code review participation

Platform Stability:
  Proxies:
    - Mean Time Between Failures (MTBF)
    - Mean Time to Recovery (MTTR)
    - P1 incident count per month
    - System uptime percentage

Decompose Abstract Goals

Break large concepts into measurable components:

"Improve Development Velocity" becomes:

Reduce average pull request review time
Decrease build pipeline duration
Increase deployment success rate
Minimize rollback frequency

Iterate Your Metrics

Treat metrics like code—your first version will have bugs:

Implement the initial metric
Observe its behavior and shortcomings
Learn from unintended consequences
Refactor or replace as needed

Using Popular Monitoring Tools

Here's how these principles apply to common engineering tools:

Datadog/New Relic Implementation

javascript

// Good: Balanced metric collection
const metrics = {
  performance: ['response_time', 'throughput'],
  quality: ['error_rate', 'success_rate'],
  user_experience: ['page_load_time', 'bounce_rate']
};

// Avoid: Single-metric focus
const badMetrics = {
  only_speed: ['requests_per_second'] // Missing quality context
};

GitHub Analytics Approach

Focus on team process health, not individual ranking:

yaml

Team Health Metrics:
  - Pull request review time distribution
  - Deployment frequency trends
  - Incident response effectiveness
  - Technical debt tracking

Individual Growth Metrics (Private):
  - Skill development progress
  - Mentoring contributions
  - Learning goal achievement

Synthesis: Building Effective Measurement Systems

The combination of Goodhart's and Gilb's Laws provides a powerful framework:

The Four-Step Approach

Identify What Matters (Gilb): Define the business outcomes you actually care about
Find Proxy Metrics (Gilb): Create measurable approximations of those outcomes
Balance Your Dashboard (Goodhart): Use multiple, counter-balancing metrics
Focus on Learning (Both): Use metrics for insight, not punishment

Real-World Example: Developer Productivity

Instead of measuring individual output, focus on system effectiveness:

yaml

Productivity System Health:
  Flow Metrics:
    - Work in Progress limits adherence
    - Cycle time variability
    - Queue time analysis
  
  Quality Metrics:
    - Defect escape rate
    - Technical debt trends
    - Code review effectiveness
  
  Learning Metrics:
    - Experiment success rate
    - Knowledge sharing frequency
    - Cross-team collaboration index

Key Takeaways for Engineering Leaders

Goodhart's Law teaches us:

Metrics become corrupted when used as targets
Focus on process improvement, not people evaluation
Balance speed metrics with quality metrics
Connect measurements to genuine business value

Gilb's Law reminds us:

An imperfect metric beats no metric
Measurement forces clarity of thought
Start simple and iterate
Use metrics for insight, not weapons

Take Action: Start Measuring What Matters

The next time you're designing metrics for your team, ask yourself:

What business outcome am I trying to achieve?
How might this metric be gamed?
What counter-balancing metrics should I include?
Am I measuring the system or the people?

Remember: The goal isn't perfect measurement—it's actionable insight that drives genuine improvement.

Ready to implement better metrics in your organization? Start with one balanced pair of metrics this week. Measure both speed and quality, both efficiency and effectiveness. Your future self (and your team) will thank you.

Want more insights on engineering leadership and building high-performing teams? Subscribe to my newsletter for practical advice delivered weekly, or explore my other articles on software engineering management.

Goodhart's & Gilb's Laws: The Science of Software Metrics

Goodhart's & Gilb's Laws: The Science of Software Metrics

Goodhart's Law: When Metrics Become Targets

The Mechanics of Metric Corruption

Why Gaming Always Wins

1. The Path of Least Resistance

2. Malicious Compliance

3. Output vs. Outcome Confusion

Defending Against Goodhart's Law

Use Process Metrics, Not People Metrics

Build Balanced Dashboards

Tie Targets to Business Outcomes

Gilb's Law: The Measurement Imperative

Why Measurement Matters

Vagueness Prevents Action

Measurement Forces Clarity

Progress Over Perfection

Practical Implementation Strategies

Start with Proxy Metrics

Decompose Abstract Goals

Iterate Your Metrics

Using Popular Monitoring Tools

Datadog/New Relic Implementation

GitHub Analytics Approach

Synthesis: Building Effective Measurement Systems

The Four-Step Approach

Real-World Example: Developer Productivity

Key Takeaways for Engineering Leaders

Take Action: Start Measuring What Matters

How to Do Code Review — Practical Guide for Development Teams

Essential UX Laws Every Software Engineer Must Know

Software Engineering Laws for Product Development