How I Think About Rollouts and Gradual Launches

I haven't shipped a feature to 100% of users on day one in years. Not because I don't trust my code. Because I've learned that production is the only environment that tells the full truth, and gradual rollouts let you listen before committing.

The Rollout Ladder

Every meaningful change follows the same pattern:

Day 1:    1% of users   → Binary check: does it crash?
Day 2-3:  5% of users   → Signal check: are metrics directionally correct?
Day 4-7:  20% of users  → Confidence check: is the effect statistically significant?
Day 8-14: 50% of users  → Full validation: does it hold at scale?
Day 15+:  100%           → Ship it

Each stage has a different purpose. At 1%, you're looking for crashes and data pipeline issues. At 5%, you're looking for directional signals on key metrics. At 20%, you have enough statistical power to detect meaningful changes. At 50%, you're validating that the effect doesn't change at higher traffic.

What I Monitor at Each Stage

At 1%: Stability

Crash rate (overall and specific to the changed code path)
ANR rate
Error rates in logs
Data pipeline health (are events flowing correctly?)

If anything looks off at 1%, I pause. The cost of investigating is low. The cost of ramping up a broken change is high.

At 5-20%: Metrics

Primary metric (the thing the change is supposed to improve)
Guardrail metrics (the things that should NOT get worse)
Session-level metrics (duration, depth, frequency)

At 50%: Edge Cases

Performance on low-end devices
Behavior in specific markets or locales
Interaction with other concurrent experiments
Long-term retention signals (D1, D7)

The Kill Criteria

Before every rollout, I define the criteria that would make me stop:

data class RolloutKillCriteria(
    val maxCrashRateIncrease: Double = 0.05,  // 5% relative increase
    val maxLatencyRegression: Long = 200,      // 200ms at p95
    val minRetentionThreshold: Double = -0.001, // No more than 0.1% D7 drop
    val maxErrorRateIncrease: Double = 0.02    // 2% relative increase
)

These aren't arbitrary. They're based on the team's historical tolerance and the specific risk profile of the change. A cosmetic change has loose kill criteria. A change to the core monetization logic has tight ones.

Common Rollout Mistakes

Ramping too fast. Going from 5% to 50% overnight because "the numbers look good." Two days of data at 5% isn't enough to detect retention effects. Patience pays.

Ignoring guardrail metrics. The primary metric looks great, so you ship. Three weeks later, retention shows a regression that was building slowly. Guardrails exist for a reason.

Not accounting for experiment interactions. Your change looks neutral. But it's interacting with another experiment that's running concurrently. The combined effect is negative, and neither experiment detected it in isolation.

Celebrating too early. A positive result at 20% doesn't mean you're done. Novelty effects can inflate early metrics. Validate that the effect persists at 50% over a full week.

The Human Side

Gradual rollouts require patience, which is hard when there's pressure to ship. Stakeholders want results. The product roadmap has deadlines. The temptation is to skip stages.

I've learned to frame it differently. A gradual rollout isn't slow shipping. It's safe shipping. The alternative isn't "shipping faster." The alternative is "shipping and then scrambling to roll back when something goes wrong."

The fastest rollout is one that doesn't need to be reverted.

How I Think About Rollouts and Gradual Launches

The Rollout Ladder

What I Monitor at Each Stage

At 1%: Stability

At 5-20%: Metrics

At 50%: Edge Cases

The Kill Criteria

Common Rollout Mistakes

The Human Side

Related posts

Ad Injection Strategies for Short-Form Video Feeds: Timing, Frequency, and Impression Recovery

From Feature to Revenue: How Ad Delivery Decisions Get Made on Mobile

What I've Learned Shipping Monetization Code to Hundreds of Millions of Users