Kafka Replay Strategy Without Duplicate Events
Replaying a Kafka topic re-delivers events, so duplicates are guaranteed unless consumers are idempotent. The safe replay playbook: dedup, offsets, and isolation.
Part of Distributed Systems Patterns That Hold Up in Production
A safe Kafka replay strategy starts from one uncomfortable fact: replaying a topic re-delivers events your consumers already processed, so duplicates are not a risk, they are a guarantee. The entire job of a replay plan is making sure those guaranteed duplicates do no harm. That means idempotent consumers first, deliberate offset handling second, and isolation from live side effects third.
Replay is one of Kafka’s best features. It lets you rebuild a broken read model, reprocess after a bug fix, or seed a new service from history. It is also one of the easiest ways to double-charge a customer or send a million duplicate emails if you treat it casually.
Why a replay strategy matters
Kafka retains events, which means the log is a source of truth you can read again. That is powerful: you can rebuild a corrupted projection, backfill a new feature, or recover from a consumer bug by reprocessing from before it shipped.
The danger is that “reprocess from before” also means re-running every side effect those events triggered. Without a plan, a replay that fixes your database also re-sends notifications, re-calls payment APIs, and re-emits downstream events. This post is part of the Distributed systems patterns series, and the dedup problem here is exactly what idempotency keys for distributed systems solve at the consumer; pair it with backpressure design for real-time systems when a replay floods downstream.
Does replaying a Kafka topic create duplicate events?
Yes, always. A replay rewinds the consumer to an earlier offset and re-delivers every message from that point, including ones already processed. There is no replay mode that magically skips what a consumer saw before; duplicate delivery is the defining behavior. Your only defense is making reprocessing idempotent.
This is not a flaw to fix; it is the nature of an append-only log. Kafka’s own “exactly-once semantics” reduce duplicates within a Kafka-to-Kafka pipeline, but they do nothing about the external side effects your consumer performs. The moment your consumer calls an email API or writes to an external system, exactly-once is your responsibility, not Kafka’s.
How do you avoid duplicates when replaying Kafka events?
Make consumers idempotent so processing an event twice yields the same result as processing it once. The two reliable approaches are tracking a unique event ID and skipping IDs you have already applied, or designing every write as an upsert keyed by the event ID so a repeat simply overwrites with identical data.
Idempotency is the foundation, and it is worth designing in before you ever need a replay. A consumer that is idempotent can be replayed fearlessly; one that is not turns every replay into a careful, risky operation.
On each event:
if event.id in processed_ids: # dedup table / cache
skip
else:
apply(event) # idempotent write (upsert by key)
record(event.id) # atomically with the write if possible
The subtlety is atomicity: recording “I processed event X” and applying its effect should happen together, or you can crash between them and either reprocess (fine, if idempotent) or lose the record (bad). Where possible, fold the dedup key into the same transaction as the write. This is the same idempotency discipline that protects any distributed write.
How do you reset a Kafka consumer offset to replay?
Stop the consumer group, reset its committed offsets to the desired position (a timestamp, a specific offset, or the earliest offset), then restart it. Resetting offsets while the group is live triggers rebalances and races; the clean path is always stop, reset, resume.
The common reset targets are:
- To earliest: reprocess the entire retained history. Use for rebuilding a projection from scratch.
- By timestamp: rewind to just before a bug shipped. The most common surgical replay.
- To a specific offset: precise control when you know exactly where to start.
Should you replay into the same topic or a new one?
Prefer replaying through an isolated consumer group or a separate environment rather than the live group that drives production side effects. Replaying into the group that sends emails or calls payment APIs re-fires all of them unless every side effect is idempotent. Isolation contains the blast radius.
There are three common patterns, in increasing safety:
- Replay the live group in place. Only safe if every consumer and every downstream side effect is idempotent. Fastest, riskiest.
- Replay through a parallel consumer group. A second group reads the same topic and rebuilds a shadow read model, which you swap in once verified. Side effects are disabled in the replay group.
- Replay into a separate environment. Reprocess in staging or a dedicated rebuild cluster, validate the result, then promote. Safest, slowest.
The right choice depends on what the consumer does. A pure read-model projection can often replay in place. A consumer that triggers irreversible external actions should never replay against live side effects.
How long should you retain Kafka data for replay?
Retain as far back as you would ever need to rebuild from, which is a deliberate capacity decision, not a default. Replay can only go back as far as the log is retained, so a 7-day retention means a 7-day replay ceiling. If you need to rebuild a read model from all history, you need either long (or infinite) retention or a compacted topic that keeps the latest value per key.
The two retention models serve different replay goals. Time/size-based retention keeps a rolling window, which is fine for “reprocess the last few days after a bug” but useless for “rebuild from the beginning of time.” Log compaction keeps the latest record per key forever, which is ideal for replaying current state into a new consumer but does not preserve the full event history.
The cost tradeoff is real: longer retention means more storage, which is exactly what tiered storage addresses by moving older log segments to cheap object storage while keeping them replayable. If replay-from-history is a requirement, decide the retention model up front, because you cannot replay events that retention already deleted.
A Kafka replay checklist
Before you rewind a single offset in production:
- Every consumer in the path is idempotent, verified, not assumed.
- Side effects (emails, charges, downstream events) are either idempotent or disabled for the replay.
- You are resetting offsets on a stopped group, with a known target (timestamp/offset/earliest).
- You have estimated the reprocessing volume and the lag it will create for live traffic.
- The replay runs in an isolated group or environment unless in-place is provably safe.
- You can stop the replay midway and know the system is still consistent.
- You have a way to verify the replayed result before it serves real users.
Can Kafka’s exactly-once semantics eliminate replay duplicates?
Only within Kafka, and only for Kafka-to-Kafka flows. Exactly-once semantics (EOS) make a read-process-write pipeline that stays inside Kafka idempotent and transactional, so a consumer that reads from one topic and writes to another will not double-apply across that boundary. That is genuinely useful, and it is not the same as protecting external side effects.
The boundary that EOS does not cross is the one that matters most during a replay: the call to a payment API, the email send, the write to an external database. The moment your consumer does something outside Kafka, exactly-once guarantees end, and your own idempotency is the only thing standing between a replay and a duplicate charge. This is why “we have exactly-once enabled” is not a substitute for idempotent consumers; it covers a narrower surface than people assume.
Treat EOS as a helpful layer for internal stream processing and idempotency as the durable guarantee for everything that touches the outside world. The two work together: EOS keeps your Kafka-internal pipeline clean, and consumer-side idempotency keeps your external effects safe when you rewind.
What I’d do differently
The mistake that teaches this lesson is replaying a topic to fix a read model and discovering, an hour later, that you also re-sent every notification in the window. The damage is not the replay; it is that the consumer’s side effects were never idempotent, so replay was never actually safe.
If I were designing the consumer from scratch, I would make idempotency a day-one property, not a thing I bolt on when a replay goes wrong. Assign every event a stable ID at production time, dedup on it at every consumer, and keep irreversible side effects behind an idempotency guard. Do that, and replay stops being a high-stakes operation and becomes the routine, boring recovery tool it should be.
Sources
- Apache Kafka, Consumer offsets and
kafka-consumer-groupstool: kafka.apache.org/documentation/#basic_ops_consumer_group - Apache Kafka, Exactly-once semantics: kafka.apache.org/documentation/#semantics
- Confluent, Message delivery guarantees: docs.confluent.io/kafka/design/delivery-semantics.html
Frequently asked questions
Does replaying a Kafka topic create duplicate events?
Yes, by definition. Replaying re-delivers messages a consumer has already processed, so duplicates are guaranteed. Safe replay depends entirely on consumers being idempotent, so reprocessing the same event twice produces the same result as processing it once.
How do you reset a Kafka consumer offset to replay?
Reset the consumer group's committed offset to an earlier position, by timestamp or to the earliest offset, while the group is stopped, then restart it. Resetting offsets on a live group risks rebalancing chaos, so stop consumers first, reset, then resume.
How do you avoid duplicates when replaying Kafka events?
Make consumers idempotent. Track a unique event ID per message and skip IDs you have already applied, or design writes to be naturally idempotent (upserts keyed by event ID). Exactly-once semantics help within Kafka, but idempotency at the consumer is what protects external side effects.
Should you replay into the same topic or a new one?
Prefer replaying into an isolated consumer group or a separate environment, not blindly into live consumers. Replaying through the same group that feeds production side effects can re-trigger emails, charges, or downstream writes unless every one of those is idempotent.