HTTP Streaming for Reels

Company: Meta
Surface: Instagram Reels
Role: Android Software Engineer
Impact Area: Ad Delivery Latency, Impression Recovery

The Problem

Standard ad delivery on Reels followed a batch request-response model. When a user enters the Reels surface, the client sends an ad request to the server. The server then ranks a batch of ads and sends them back in a single JSON response.

The challenge with this batch model is latency. The client cannot inject any ads until the entire batch is ranked and the response is fully received. In many cases, particularly on slower networks or when the ranking compute is heavy, the user has already scrolled past the first few eligible ad slots before the batch response arrives.

This results in "missed slots" - opportunities for high-value impressions that are lost because the ad delivery was not fast enough. At Meta's scale, even a small percentage of missed slots across millions of sessions represents significant unrealized revenue.

The Insight

Before building anything, I analyzed the server-side ranking latency profile. I found that while the total time to rank a full batch of ads was meaningful, the time to rank the first ad in the batch was significantly lower.

The bottleneck wasn't just the ranking compute itself; it was the "all-or-nothing" nature of the batch JSON response. The server had the first ranked ad ready, but it was sitting idle while the rest of the batch was being processed.

If the client could receive and inject ads as they were ranked, rather than waiting for the full batch, we could eliminate the first-slot latency bottleneck and recover those missed impressions.

The Solution: HTTP Chunked Streaming

I designed and implemented a streaming ad delivery system for Reels. Instead of a standard JSON response, the server uses HTTP Chunked Transfer Encoding to stream individual ad objects to the client as soon as they are ranked.

The system operates through three primary components:

StreamingResponseParser. A custom client-side parser that can handle a continuous stream of JSON chunks. Unlike standard parsers that require a complete document, this parser identifies individual ad objects within the stream, validates them, and passes them to the injection layer in real-time.

EagerInjectionOrchestrator. This component manages the Reels feed queue and looks for the earliest eligible ad slot. As soon as the first ad chunk is parsed, the orchestrator injects it into the next available slot, often while the server is still ranking the second or third ad in the stream.

ConnectionMonitor. A reliability layer that tracks the health of the streaming connection. If the stream is interrupted or times out, it gracefully falls back to the remaining cached ads or triggers a fresh request, ensuring that the user experience is never degraded by a connection issue.

Key Engineering Decisions

Incremental Parsing vs. Batch Parsing. Moving from batch to incremental parsing was the most significant architectural shift. I chose a custom streaming parser over a standard SAX or Gson parser because it allowed for more granular control over error handling and partial object validation, which is critical when dealing with a live network stream.

SLA and Timeout Calibration. Streaming introduces new failure modes, such as a "stalled" stream where the connection is open but no data is being sent. I established a tiered timeout strategy: a short timeout for the first ad chunk (to ensure we don't miss the first slot) and a more relaxed timeout for subsequent chunks.

Memory Management. Handling a continuous stream of data requires careful memory management to avoid heap fragmentation or leaks. I implemented a pooling mechanism for the parser's buffers and ensured that each ad object is cleanly handed off to the injection layer and then released from the parser's memory.

Measurement and Experimentation

The primary metric for the A/B experiment was the First Ad Impression Rate - the percentage of sessions where the first eligible ad slot was successfully filled. Secondary metrics included total ad impressions per session and server-side compute efficiency.

The results were definitive. HTTP Streaming significantly increased the first-slot impression rate by reducing the time-to-first-ad-injection. It effectively "compressed" the delivery latency, allowing the ads to keep pace with the user's scroll speed even on slower connections.

The feature also showed a positive impact on total impressions, as the "head" of the ad batch was now being captured more reliably, which cascaded into better delivery for the rest of the session.

Outcome

HTTP Streaming is now the standard delivery model for Reels ads at Meta. It has eliminated a major source of impression loss, improved the responsiveness of the ads monetization stack, and demonstrated the power of moving from batch to event-driven architectures in high-scale mobile environments.

The core takeaway: in high-velocity surfaces like Reels, every millisecond counts. Moving the delivery logic closer to the "speed of ranking" is one of the most effective ways to recover lost revenue without increasing server load.

HTTP Streaming for Reels Ad Delivery