Why I Always Start With the Failure Case
· 5 min read

When I start building something new, I don't think about the happy path first. I think about what happens when things go wrong.
What if the network is unavailable? What if the API returns an error? What if the user taps the button twice? What if the device is low on memory? What if the data is corrupted?
This isn't pessimism. It's the single most valuable habit I've developed as an engineer. The happy path usually takes care of itself. The failure cases are where systems break, users get frustrated, and teams lose sleep at 2 AM.
The Problem With Happy-Path-First
When you design for the happy path first, failure handling becomes an afterthought. You bolt on error handling at the end, usually in a rush, usually incomplete.
// Happy-path-first thinking:
fun loadUserProfile(userId: String): UserProfile {
val response = api.getProfile(userId)
return response.toUserProfile()
}
// "Oh right, I should handle errors..."
fun loadUserProfile(userId: String): UserProfile? {
return try {
val response = api.getProfile(userId)
response.toUserProfile()
} catch (e: Exception) {
null // Now every caller has to deal with null
}
}The problem compounds. Every downstream function now receives a nullable type. The UI layer has to guess what "null" means. Was it a network error? A 404? A malformed response? The information is lost because error handling was an afterthought.
Failure-First Design
When I start with failure cases, the code structure naturally accommodates both success and failure from the beginning.
sealed class ProfileResult {
data class Success(val profile: UserProfile) : ProfileResult()
data class NotFound(val userId: String) : ProfileResult()
data class NetworkError(val cause: Throwable) : ProfileResult()
data class ServerError(val code: Int, val message: String) : ProfileResult()
}
fun loadUserProfile(userId: String): ProfileResult {
val response = try {
api.getProfile(userId)
} catch (e: IOException) {
return ProfileResult.NetworkError(e)
}
return when (response.code) {
200 -> ProfileResult.Success(response.body.toUserProfile())
404 -> ProfileResult.NotFound(userId)
else -> ProfileResult.ServerError(response.code, response.message)
}
}Now every caller knows exactly what can go wrong and can handle each case appropriately. The UI can show "no internet" for network errors, "user not found" for 404s, and "something went wrong" for server errors. No guessing.
My Failure-Case Checklist
For every feature I build, I run through these categories:
Network failures
- Connection timeout
- Request timeout
- No internet connectivity
- Partial response (connection drops mid-transfer)
- SSL/certificate errors
Data failures
- Missing required fields in API response
- Unexpected data types (string where you expect int)
- Empty collections where you expect at least one item
- Stale cache data that conflicts with fresh data
User behavior failures
- Double-tap on submit buttons
- Navigating away mid-operation
- Rotating the device during a long-running task
- Backgrounding the app during a critical flow
- Rapidly switching between screens
Device failures
- Low memory (OS may kill your process)
- Low storage (can't write to disk)
- Slow CPU (operations take longer than expected)
- Screen sizes you didn't test on
Timing failures
- Race conditions between concurrent operations
- Operations completing after the UI is destroyed
- Stale callbacks from previous screen instances
A Real Example
I was building a feature that let users save content to a collection. The happy path was simple: tap save, call API, show confirmation.
Before writing any code, I listed the failure cases:
- User taps save while offline
- User taps save twice quickly
- API call succeeds but the UI has already been dismissed
- Collection is full (server-side limit)
- Content was deleted by the time the save request reaches the server
- User's auth token expired between tapping save and the request firing
Each of these required a different response. For #1, I queue the save and retry when connectivity returns. For #2, I debounce the tap and disable the button during the request. For #3, I use a ViewModel that survives the UI lifecycle. For #4 and #5, I show specific error messages. For #6, I silently refresh the token and retry.
If I had started with the happy path, most of these would have been bugs discovered in production.
The Counterargument
People push back on this approach. "You're over-engineering. YAGNI. Ship it and fix bugs later."
There's some truth to that. Not every failure case needs handling upfront. But there's a difference between "I know about this edge case and I'm choosing to defer it" and "I never considered this edge case and it bit me in production."
Failure-first thinking isn't about handling every possible error. It's about being aware of every possible error. You still make trade-offs. You still ship incrementally. But you make those trade-offs consciously, with a list of known risks.
What This Looks Like in Practice
I keep a simple format in my design notes:
Feature: Save to Collection
Failure cases:
[HANDLE] Offline - queue and retry
[HANDLE] Double-tap - debounce + disable
[HANDLE] UI dismissed - ViewModel survives
[HANDLE] Collection full - show error with suggestion
[DEFER] Content deleted - show generic error, revisit if frequent
[DEFER] Token expired - rely on existing auth refresh middleware
The [HANDLE] items get built into the implementation. The [DEFER] items get documented so future engineers (including future me) know they exist.
This takes 10 minutes and prevents hours of debugging. More importantly, it builds a habit of defensive thinking that compounds over time. The engineer who thinks about failure first doesn't produce perfect code. They produce resilient code, which is much more valuable.