Idempotency Keys for Distributed Systems

Idempotency keys make retried requests safe so a timed-out payment or duplicate POST applies exactly once. The design, storage, and TTL decisions that matter.

Part of Distributed Systems Patterns That Hold Up in Production

By Colson · Distinguished Software Engineer, Founder

July 7, 2026 10 min read

Idempotency keys visualized as a client retry sending the same key to a dedup store that returns one stored result

Idempotency keys are how you make a retried request safe. A client attaches a unique token to an operation, the server does the work once and stores the result under that token, and every later request carrying the same token returns the stored result instead of repeating the side effect. That is the whole idea: the same logical operation can be sent any number of times and still charge the card, create the order, or send the email exactly once.

You need this because retries are not optional in distributed systems. A request times out, the network drops the response, a load balancer resets the connection, and the client has no idea whether the work happened. Its only safe move is to retry. Without idempotency, that retry double-charges. With it, the retry is a no-op that returns the original answer.

What is an idempotency key?

An idempotency key is a unique, client-generated token attached to a request so the server can tell a retry apart from a genuinely new operation. The first request under a given key does the work and stores its result. Any later request with the same key returns that stored result, so the side effect happens once no matter how many times the request arrives.

The mental shift is treating a retry as the client re-asking the same question, not issuing a new command. “Charge this card $40, operation a1b2c3” sent five times is one charge with one answer, returned five times. The key is what lets the server collapse those five physical requests into one logical operation.

This only matters for operations with side effects. A GET is already idempotent by definition: reading twice changes nothing. The hard cases are POST-style operations that create or mutate state, payments, order creation, account provisioning, sending a message. Those need an explicit key because the HTTP method alone does not promise safety. This post is part of the Distributed systems patterns series.

How do idempotency keys work?

The server checks a dedup store on every keyed request. If the key is new, it executes the operation and saves the response under that key, atomically with the side effect. If the key already exists, it skips the work and returns the stored response. That check-execute-store cycle is the entire mechanism, and the atomicity is the part people get wrong.

The flow looks like this:

On request with Idempotency-Key K:
  begin transaction
    row = lookup(K)
    if row exists:
        if row.status == COMPLETED:
            return row.stored_response          # retry: replay the answer
        else:
            return 409 / "request in progress"  # concurrent duplicate
    else:
        insert(K, status=IN_PROGRESS)           # claim the key
  commit

  result = do_the_work()                          # the side effect

  begin transaction
    update(K, status=COMPLETED, stored_response=result)
  commit
  return result

Two failure modes drive the design. First, two retries can race and both see “key not new,” so claiming the key needs a unique constraint or conditional write that lets exactly one winner proceed and forces the other to wait or replay. Second, the process can crash after do_the_work() but before storing the result, which is why the cleanest implementations fold the side effect and the key-completion write into a single database transaction. When the side effect is an external API call you cannot enroll in your transaction, you accept an IN_PROGRESS window and reconcile, which is the genuinely hard part.

Where should you store idempotency keys?

Store keys in a fast, durable store the request path already trusts. For most services that is the same transactional database as the operation, because then the dedup record and the side effect commit together. Redis works when you need lower latency, as long as you run it with persistence and accept its weaker durability story relative to your primary database.

The store choice is really a question of whether you can make the key write atomic with the side effect. Pick based on that:

Approach	What it is	Atomic with side effect?	Best for
Natural / business key	Dedup on existing fields (order number, request hash)	Yes, same DB	Operations that already have a unique business identifier
Client-generated key + dedup table	Client UUID stored in a dedicated table	Yes, same transaction as the write	General-purpose API idempotency (the default)
External dedup store (Redis/KV)	Key tracked outside the system of record	No, two-phase, needs reconciliation	Cross-service or very high-throughput paths

The natural-key approach is the cheapest when it fits: if every order already has a unique client order ID, you do not need a separate idempotency table, just a unique constraint. The client-generated key in your own database is the workhorse for general APIs. The external store buys throughput and cross-service reach but reintroduces the two-write problem, so you only reach for it when the primary database cannot absorb the dedup load.

On TYPEMUSE, the polyglot platform I run, the keyed-write services keep the idempotency record in the same Postgres transaction as the mutation wherever the side effect is a local write. The places that genuinely hurt are the ones calling a third-party API, where there is no shared transaction and you live with an in-progress window. That asymmetry, local versus external side effects, is the single biggest factor in how hard idempotency is to get right.

How long should you keep idempotency keys?

Keep keys at least as long as a client might realistically retry the same operation, then expire them with a TTL. In practice that is 24 hours to a few days for synchronous APIs. The window has to outlast your longest retry chain and end-to-end timeout budget, because a key that expires before the last retry arrives reopens exactly the duplicate window you were trying to close.

The retention decision is a direct function of your retry policy. If a client retries with backoff for up to ten minutes, a one-minute TTL is a bug: the final retry lands after the key is gone, the server treats it as new, and you double-apply. Size the TTL against the worst-case retry horizon, which means it is coupled to your timeout budgets across service chains, not chosen in isolation.

The opposing pressure is store growth. Keys are write-once and read rarely after the retry window closes, so an unbounded store accumulates dead records that cost storage and slow lookups. A TTL is mandatory, not optional. The right answer is the smallest window that safely covers your retry budget plus a margin, expired automatically.

Are idempotency keys the same as exactly-once delivery?

No, and conflating them causes real outages. Exactly-once delivery across a network you do not control is essentially a myth: you cannot guarantee a message arrives once and only once when acknowledgements themselves can be lost. What you can guarantee is exactly-once effect on top of at-least-once delivery. Messages may arrive twice, but idempotency keys ensure the operation applies once.

This reframes the entire problem. Stop trying to prevent duplicate delivery, which is unachievable end to end, and instead make duplicate processing harmless, which is achievable. At-least-once plus idempotency is the combination that production systems actually ship, and it is strictly more robust than chasing a delivery guarantee that breaks the moment a network partition eats an ack.

Property	At-least-once + idempotency	”Exactly-once delivery”
What it promises	Effect applied once, despite duplicate delivery	Message delivered once (in theory)
Holds across untrusted network	Yes	No, breaks on lost acks / partitions
Where the work lives	Receiver (dedup on key)	Transport / broker
Real-world status	Shippable, standard practice	Mostly marketing outside a single system

The systems that advertise exactly-once, like Kafka’s transactional semantics, deliver it only inside their own boundary. The moment a consumer calls an external API or writes to another system, that guarantee ends and your own idempotency takes over. That boundary is the focus of Kafka replay strategy without duplicate events, and the lesson is the same: the broker protects its internal pipeline, idempotency protects the outside world.

How do retries interact with idempotency keys?

A retry is only safe if it carries the same idempotency key as the original request. The key is what tells the server “this is the same operation I already asked about,” so the client must generate the key once per logical operation and reuse it across every retry of that operation. Generate a fresh key on retry and you have defeated the entire mechanism, because the server sees two distinct operations.

This places a real requirement on the client. The key belongs to the logical operation, not to the HTTP attempt. The client creates a UUID when it decides to charge the card, then sends that same UUID on the first attempt and on every retry, through every timeout and backoff. Only when the operation logically restarts (the user clicks “pay” again) does it mint a new key.

The interaction with timeouts is where this gets sharp. When a request times out, the client does not know if the server completed the work, so it retries with the same key. If the server finished, it replays the stored response. If the server is still processing, it returns “in progress” and the client backs off. The pairing only works when retry policy and timeout budget are designed together with the key lifetime, which is why I treat retries, timeouts, and idempotency as a single subsystem rather than three independent features.

A checklist for implementing idempotent endpoints

Run this before you call a keyed endpoint production-ready:

The client generates one key per logical operation (a UUID) and reuses it across all retries of that operation.
The server claims the key with a unique constraint or conditional write, so concurrent duplicates cannot both proceed.
The key record and the side effect commit in the same transaction whenever the side effect is a local write.
External side effects (third-party API calls) have an explicit in-progress state and a reconciliation path.
The stored response is returned verbatim on a duplicate, including status code, so retries are transparent.
A key TTL is set that outlasts the worst-case retry and timeout budget, with automatic expiry.
Concurrent requests with the same key get a deterministic answer (replay or 409), never a partial double-apply.
You have tested the crash-between-work-and-store window, not just the happy path.
The endpoint documents the key header and its semantics for clients.

What I’d do differently

The lesson that sticks is the one you learn from a double-charge, or in my case from a keyed write that recorded the idempotency row in one transaction and performed the side effect in another. Under normal load it was invisible. Under a deploy-time restart, a few requests crashed in the gap: the work had run, the completion record had not, and the retry re-ran the work. The bug was not the retry. The bug was that “did this happen” and “do this” were two separate writes.

If I were starting over, I would design idempotency as a property of the operation from the first commit, not a guard bolted on after an incident. Concretely: make the key write and the side effect a single transaction by default, and treat any operation where that is impossible (anything calling an external system) as a flagged special case that needs an explicit in-progress state and reconciliation, reviewed deliberately. I would also size the key TTL against the real retry budget rather than picking a round number, because a TTL shorter than the retry horizon is a silent duplicate window that only shows up under the exact conditions, slow downstreams and aggressive retries, when you can least afford it. Get those two things right, atomic key writes and a budget-aligned TTL, and idempotency stops being a source of incidents and becomes the boring foundation that makes retries safe.

Sources

Stripe, Idempotent requests: docs.stripe.com/api/idempotent_requests
IETF, The Idempotency-Key HTTP Header Field (draft): datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
AWS, Powertools for Lambda, Idempotency: docs.powertools.aws.dev/lambda/python/latest/utilities/idempotency/
Confluent, Exactly-once semantics and message delivery: docs.confluent.io/kafka/design/delivery-semantics.html

#idempotency #distributed systems #apis #reliability #retries

Frequently asked questions

What is an idempotency key?

An idempotency key is a unique client-generated token attached to a request so the server can recognize retries of the same logical operation. The first request does the work and stores the result under the key. Later requests with the same key return that stored result instead of repeating the side effect.

How do idempotency keys work?

The client sends a unique key with a request. The server checks a dedup store: if the key is new, it executes the operation and saves the response keyed by that token, atomically. If the key already exists, it returns the stored response without re-running the work, so the side effect happens exactly once.

Where should you store idempotency keys?

Store keys in a fast, durable store the request path already trusts, usually the same transactional database as the operation, or Redis with a persistence guarantee. The key record must be written in the same transaction as the side effect, or the dedup guarantee breaks under crashes.

How long should you keep idempotency keys?

Keep keys at least as long as clients realistically retry, commonly 24 hours to a few days, then expire them with a TTL. The window must outlast your longest retry chain and timeout budget. Too short reopens the duplicate window, too long bloats the store unbounded.

Are idempotency keys the same as exactly-once delivery?

No. Exactly-once delivery is mostly a myth across a network you do not control. Idempotency keys give you exactly-once effect on at-least-once delivery: messages may arrive twice, but the operation applies once. That pairing is what production systems actually ship.

What is an idempotency key?

How do idempotency keys work?

Where should you store idempotency keys?

How long should you keep idempotency keys?

Are idempotency keys the same as exactly-once delivery?

How do retries interact with idempotency keys?

A checklist for implementing idempotent endpoints

What I’d do differently

Sources

Frequently asked questions

Liked this breakdown?

Keep reading

Backpressure Design for Real-Time Systems

Timeout Budgets Across Service Chains

Sharding Strategies Before You Need Them