How I Think About Rollouts and Gradual Launches

· 4 min read

How I Think About Rollouts and Gradual Launches

I haven't shipped a feature to 100% of users on day one in years. Not because I don't trust my code. Because I've learned that production is the only environment that tells the full truth, and gradual rollouts let you listen before committing.

The Rollout Ladder

Every meaningful change follows the same pattern:

Day 1:    1% of users   → Binary check: does it crash?
Day 2-3:  5% of users   → Signal check: are metrics directionally correct?
Day 4-7:  20% of users  → Confidence check: is the effect statistically significant?
Day 8-14: 50% of users  → Full validation: does it hold at scale?
Day 15+:  100%           → Ship it

Each stage has a different purpose. At 1%, you're looking for crashes and data pipeline issues. At 5%, you're looking for directional signals on key metrics. At 20%, you have enough statistical power to detect meaningful changes. At 50%, you're validating that the effect doesn't change at higher traffic.

What I Monitor at Each Stage

At 1%: Stability

  • Crash rate (overall and specific to the changed code path)
  • ANR rate
  • Error rates in logs
  • Data pipeline health (are events flowing correctly?)

If anything looks off at 1%, I pause. The cost of investigating is low. The cost of ramping up a broken change is high.

At 5-20%: Metrics

  • Primary metric (the thing the change is supposed to improve)
  • Guardrail metrics (the things that should NOT get worse)
  • Session-level metrics (duration, depth, frequency)

At 50%: Edge Cases

  • Performance on low-end devices
  • Behavior in specific markets or locales
  • Interaction with other concurrent experiments
  • Long-term retention signals (D1, D7)

The Kill Criteria

Before every rollout, I define the criteria that would make me stop:

data class RolloutKillCriteria(
    val maxCrashRateIncrease: Double = 0.05,  // 5% relative increase
    val maxLatencyRegression: Long = 200,      // 200ms at p95
    val minRetentionThreshold: Double = -0.001, // No more than 0.1% D7 drop
    val maxErrorRateIncrease: Double = 0.02    // 2% relative increase
)

These aren't arbitrary. They're based on the team's historical tolerance and the specific risk profile of the change. A cosmetic change has loose kill criteria. A change to the core monetization logic has tight ones.

Common Rollout Mistakes

Ramping too fast. Going from 5% to 50% overnight because "the numbers look good." Two days of data at 5% isn't enough to detect retention effects. Patience pays.

Ignoring guardrail metrics. The primary metric looks great, so you ship. Three weeks later, retention shows a regression that was building slowly. Guardrails exist for a reason.

Not accounting for experiment interactions. Your change looks neutral. But it's interacting with another experiment that's running concurrently. The combined effect is negative, and neither experiment detected it in isolation.

Celebrating too early. A positive result at 20% doesn't mean you're done. Novelty effects can inflate early metrics. Validate that the effect persists at 50% over a full week.

The Human Side

Gradual rollouts require patience, which is hard when there's pressure to ship. Stakeholders want results. The product roadmap has deadlines. The temptation is to skip stages.

I've learned to frame it differently. A gradual rollout isn't slow shipping. It's safe shipping. The alternative isn't "shipping faster." The alternative is "shipping and then scrambling to roll back when something goes wrong."

The fastest rollout is one that doesn't need to be reverted.

Related Posts