Lessons From Building Android Features Used at Scale

Shipping code that runs on millions of devices is a different game than shipping to a few thousand users. The stakes are higher, the feedback loops are longer, and the blast radius of a bug is enormous. Here's what I've learned.

Key Takeaways

At scale, every edge case is someone's main case.
Experimentation is not optional - it's how you make decisions.
Monitoring matters more than testing alone.
Rollbacks need to be instant, not "we'll fix it in the next release."

Edge Cases Become the Norm

When your feature reaches millions of users, statistical improbabilities become daily occurrences. A bug that affects 0.01% of sessions? That's thousands of people. A race condition that only happens on slow networks? It happens constantly in emerging markets.

I've learned to think about edge cases differently:

What happens if this API call takes 30 seconds? Not just "timeout" - what does the user see during those 30 seconds?
What if the user rapidly taps this button? Debouncing isn't just nice-to-have, it prevents duplicate transactions.
What if the device has 512MB of RAM? Your bitmap loading strategy matters a lot more than you think.

Ship Behind Experiments, Always

One of the biggest lessons I've internalized: never ship a feature directly to 100% of users. Always use experiments.

The rollout strategy I follow:

Employee testing (dogfooding)
    ↓
1% of users (canary)
    ↓
10% of users (validate metrics)
    ↓
50% of users (watch for regressions)
    ↓
100% (full launch)

At each stage, I'm looking at specific metrics: crash rate, latency, engagement, and revenue signals. If any metric regresses beyond the threshold, we pause and investigate before moving to the next stage.

The Metric That Matters Most

Early in my career, I focused on crash-free rate as the primary quality signal. It's important, but it's not enough. A feature can be crash-free and still be broken - slow renders, janky scrolling, or incorrect data all degrade the user experience without crashing.

The metrics I watch closely:

ANR rate - More insidious than crashes. A frozen app is worse than a crashed one because the user waits.
P95 render time - Average render time hides problems. The 95th percentile tells you how bad it gets.
Engagement delta - If users start using your feature less after a change, something's wrong even if it's technically "working."
Error rate by device tier - Low-end devices often reveal problems that high-end devices mask.

What I Got Wrong Early On

Premature optimization

I once spent two weeks optimizing a RecyclerView adapter for a list that had at most 20 items. The optimization was technically correct but practically pointless. Now I profile first and optimize only where the data tells me to.

Ignoring configuration changes

I assumed ViewModel would handle everything. It doesn't survive process death. I learned to use SavedStateHandle after losing user input on a critical flow - in production.

Testing only the happy path

My early tests verified that things worked when everything went right. They never tested what happened when the network returned a 500 or the database was corrupted. Those are the tests that actually catch production bugs.

Building for Resilience

The features that survive at scale share common patterns:

Idempotent operations - If the user retries, nothing breaks.
Graceful degradation - If a dependency is down, show what you can.
Circuit breakers - If an API is failing, stop calling it and use cached data.
Feature flags - If something goes wrong, turn it off without an app update.

Final Thought

Scale teaches you humility. You learn that your assumptions are wrong, your tests are incomplete, and your monitoring is never good enough. The best engineers I've worked with aren't the ones who write perfect code - they're the ones who build systems that fail gracefully and recover quickly.