Notification Fanout Architecture
Notification fanout comes down to fan-out-on-write vs on-read, and the celebrity problem that breaks the naive choice. The hybrid design that scales, explained.
Part of Distributed Systems Patterns That Hold Up in Production
Notification fanout architecture is one of the most-asked system design problems, and it collapses into a single decision: do you do the delivery work when content is written, or when it is read? Fan-out-on-write makes reads cheap and writes expensive; fan-out-on-read does the opposite. The naive version of either breaks on one input, the account with millions of followers, which is why every system at scale ends up hybrid.
This is the Twitter-timeline problem, the push-notification problem, and the activity-feed problem, all the same shape underneath. Get the fan-out model right and the system scales smoothly; get it wrong and one popular user takes the whole thing down.
Why fanout is the core of feed and notification systems
Any system where one action reaches many people, a post hitting followers, an event triggering notifications, a comment alerting subscribers, has a fan-out problem at its heart. The number of deliveries per action can be enormous, and where you spend that work determines whether the system is fast and affordable or slow and expensive.
The reason this shows up in every senior system design interview is that it forces a real tradeoff with no free answer. You are moving cost between write time and read time, and the right move depends on your access patterns and your worst-case input. This post is part of the Distributed systems patterns series.
What is the difference between fan-out-on-write and fan-out-on-read?
Fan-out-on-write does the delivery work at publish time: when a user posts, the system immediately writes that post into every follower’s feed. Reads are then trivial, just read your own pre-built feed. Fan-out-on-read does almost nothing at write time and instead assembles each user’s feed on demand by pulling and merging the posts of everyone they follow when they open the app.
The two models simply move the cost to opposite ends:
| Fan-out-on-write (push) | Fan-out-on-read (pull) | |
|---|---|---|
| Work at write time | High (write to N feeds) | Minimal (store once) |
| Work at read time | Minimal (read own feed) | High (gather + merge live) |
| Best when | Reads >> writes | Writes >> reads, or huge fan-out |
| Storage | High (feed per user) | Low (store once) |
| Weakness | The celebrity problem | Slow, expensive reads |
Most consumer products are read-heavy (people scroll far more than they post), which biases toward fan-out-on-write so the common operation, reading the feed, is cheap. That bias is correct right up until one account has millions of followers.
What is the celebrity problem in fan-out?
The celebrity problem is what happens to fan-out-on-write when one account has millions of followers. A single post from that account must be written into millions of feeds, turning one write into a massive, slow, expensive operation that can overwhelm the system and delay delivery for everyone. The cost of fan-out-on-write scales with follower count, and celebrities break the model.
This is the specific failure that pushes systems off the naive design. You cannot serve a power-law follower distribution with a single fan-out model, because the model that is efficient for the millions of normal accounts is catastrophic for the handful of huge ones.
How do you scale a notification system? The hybrid model
The answer used by large feed systems is a hybrid: fan-out-on-write for normal accounts, fan-out-on-read for celebrity accounts. Normal users get their content pushed into followers’ feeds so reads stay cheap; high-follower accounts skip the write fan-out, and their posts are pulled in at read time and merged into the requesting user’s feed. You get cheap reads in the common case and avoid the write explosion in the extreme case.
The mechanics of the hybrid:
- Classify accounts by follower count. Above a threshold, an account is “celebrity” and is excluded from write fan-out.
- Normal posts fan out on write into follower feeds as usual.
- Celebrity posts are stored once and not pushed.
- At read time, a user’s feed is their pre-built feed (from normal accounts) merged with a live pull of the celebrities they follow.
- Cache aggressively, because a celebrity’s recent posts are read by millions and should be served from cache, not recomputed per reader.
This keeps the expensive operation rare: only the small number of celebrity accounts incur read-time merging, and their content is cacheable precisely because so many people want the same thing.
The infrastructure underneath: queues and idempotency
Fan-out is bursty by nature. A popular post creates a spike of delivery work, and you cannot let that spike hit your databases synchronously. The standard pattern is a queue between the write and the fan-out workers: the post is accepted quickly, a fan-out job is enqueued, and workers drain it at a controlled rate. This is backpressure applied to delivery, the same principle covered in Backpressure Design for Real-Time Systems.
Delivery must also be idempotent. Fan-out workers retry on failure, and without idempotency a retry sends a duplicate notification, which users notice and hate. Key each delivery by (event, recipient) and skip ones already delivered, the same discipline as Idempotency Keys for Distributed Systems. For push notifications specifically, deduplication before the final send is what stops the dreaded double-buzz.
How do you store and trim notification feeds?
Cap each feed and trim it, because under fan-out-on-write a feed grows without bound and most of it is never read. The standard move is to keep a bounded recent window per user (the last few hundred items) in the fast store, and fall back to a query against durable storage for the rare deep scroll. Storing every notification forever in every feed is the cost that quietly sinks fan-out-on-write.
The economics are the point. Fan-out-on-write trades storage for read speed, and if you do not bound that storage, the trade gets worse every day as feeds accumulate items nobody scrolls to. A capped feed keeps the hot path small and cheap: writes append and trim, reads hit a small structure, and the long tail lives in cheaper storage that is queried only when someone actually asks for it.
This also bounds the write amplification. Trimming as you write means a feed never grows past its cap, so the per-recipient cost stays constant rather than creeping upward over a user’s lifetime. Decide the cap from real scroll behavior, keep the recent window fast, and let durable storage hold the history that is rarely requested.
A notification fanout checklist
Before you ship a feed or notification system:
- You have chosen a fan-out model based on your real read/write ratio, not a default.
- High-follower accounts are special-cased (read fan-out) so one post cannot trigger millions of writes.
- A queue sits between the triggering event and fan-out workers to absorb bursts.
- Delivery is idempotent, keyed by (event, recipient), so retries cannot duplicate.
- Celebrity content is cached, since many readers want the same recent posts.
- You have a plan for feed storage growth under fan-out-on-write (it is not free).
- Delivery has a deadline and a dead-letter path, so a stuck recipient does not block the batch.
What I’d do differently
The mistake in interviews and in real systems alike is committing to one fan-out model and trying to make it serve every account. Pure fan-out-on-write is elegant until a celebrity joins; pure fan-out-on-read is simple until reads dominate your cost. Neither survives contact with a real follower graph.
If I were designing this from scratch, I would start with fan-out-on-write for the read-heavy common case, identify the high-degree accounts early, and route them to read fan-out before they ever become an incident. The hybrid is not a premature optimization here; it is the known shape of the problem, and building it in from the start is cheaper than retrofitting it after a popular user melts your write path. The closely related real-time delivery layer is covered in WebSocket Capacity Planning for Social Products.
Sources
- The System Design Primer, feeds and fan-out: github.com/donnemartin/system-design-primer
- Redis, solutions and patterns for feeds: redis.io/solutions
- Apache Kafka, use cases (event fan-out): kafka.apache.org/uses
Frequently asked questions
What is fan-out in a notification system?
Fan-out is the act of taking one event and delivering it to many recipients. The core design choice is when to do that work: fan-out-on-write pushes the event into every recipient's feed at publish time, while fan-out-on-read assembles each recipient's feed on demand when they open the app.
What is the difference between fan-out-on-write and fan-out-on-read?
Fan-out-on-write does the delivery work up front, so reads are fast but a single write can fan out to millions of feeds. Fan-out-on-read does little at write time but makes every read expensive, since it gathers and merges content live. Most large systems use a hybrid of both.
What is the celebrity problem in fan-out?
The celebrity problem is when one account has millions of followers, so fan-out-on-write must update millions of feeds for a single post. That one write becomes a massive, slow, expensive operation. The fix is to handle high-follower accounts with fan-out-on-read instead.
How do you scale a notification system?
Use a hybrid fan-out: fan-out-on-write for normal accounts so reads stay cheap, and fan-out-on-read for celebrity accounts so a single post does not update millions of feeds. Add a queue to absorb fan-out bursts and make delivery idempotent so retries cannot duplicate notifications.