PodDisruptionBudgets That Actually Protect You
A PodDisruptionBudget keeps your service up during node drains and upgrades, but a wrong value blocks drains or protects nothing. How to set PDBs correctly.
Part of Kubernetes Operations for Production Platforms
A PodDisruptionBudget is what keeps your service running when a node is drained or the cluster is upgraded, but only if you set it correctly. Set it right and Kubernetes will never voluntarily take down more pods than your service can spare. Set minAvailable equal to your replica count and you block node drains forever; set it too low and it protects nothing. The PDB is a small object with a sharp edge: it is either load-bearing or a footgun, depending on one number.
PDBs are also one of the most-skipped items in a deployment checklist, which is why “the cluster upgrade took the service down” remains a common incident. The fix is cheap; the omission is expensive.
Why PodDisruptionBudgets matter
Kubernetes clusters are constantly in motion: nodes get drained for maintenance, the cluster gets upgraded, the autoscaler removes underused nodes. Each of those is a voluntary disruption that evicts the pods on the affected node. Without a PDB, nothing stops Kubernetes from evicting all of your service’s pods at once if they happen to share a node or a drain wave.
A PodDisruptionBudget is how you tell the cluster “you may take some of my pods during maintenance, but never so many that I go down.” It is the contract that makes routine operations safe for your service. This post is part of the Kubernetes operations series.
What does a PodDisruptionBudget actually protect against?
A PDB protects against voluntary disruptions: node drains, cluster upgrades, and other deliberate evictions. It explicitly does not protect against involuntary disruptions, a node crashing, hardware failing, a pod being OOM-killed. This distinction is the single most misunderstood thing about PDBs, and getting it wrong leads people to expect protection a PDB cannot give.
The practical takeaway: use PDBs to make drains and upgrades safe, and use replica count plus anti-affinity (spreading pods across nodes and zones) to survive crashes. They solve different halves of availability, and you need both.
How do you set minAvailable on a PodDisruptionBudget?
Set minAvailable to the number of pods your service genuinely needs to keep serving, while leaving headroom for at least one pod to be evicted at a time so drains can actually make progress. The classic safe value for a three-replica deployment is minAvailable: 2: the service stays healthy on two pods, and one pod can be drained at a time. You can also express it as maxUnavailable instead, which is often clearer for larger deployments.
The two boundaries to avoid:
| Setting | Effect | Verdict |
|---|---|---|
minAvailable == replica count | No pod can ever be evicted; drains block forever | Broken |
minAvailable very low (e.g. 0) | Everything can be evicted at once | Protects nothing |
minAvailable = replicas − 1 (e.g. 2 of 3) | One pod drains at a time; service stays up | Usually right |
maxUnavailable as a percentage | Scales with replica count | Good for large deployments |
The reasoning is that a drain needs to evict pods to proceed, so the budget must permit at least one eviction at a time, while still keeping enough pods up to serve. That tension, allow progress, preserve availability, is the whole design of the value.
Why is my node drain stuck or blocked?
A stuck node drain is very often an unsatisfiable PodDisruptionBudget. If minAvailable equals the replica count, Kubernetes can never evict a pod without violating the budget, so the drain blocks indefinitely. The same happens with a single-replica deployment that has a PDB requiring that one pod stay available: there is no way to drain it without breaking the budget.
The fixes are direct: lower minAvailable so at least one eviction is allowed, or add replicas so the service can tolerate losing one. A single-replica service fundamentally cannot satisfy both “stay available” and “be drained,” so if it matters, give it more than one replica. This is also a good prompt to check that your replicas are spread across nodes, because a PDB does you little good if all your pods sit on the node being drained.
minAvailable or maxUnavailable: which should you use?
Use maxUnavailable for larger or variable-replica deployments and minAvailable for small fixed ones. They express the same budget from opposite ends, but maxUnavailable (often as a percentage) stays correct when your replica count changes, while a hardcoded minAvailable can silently become wrong after a scale event. For a service whose replica count moves with autoscaling, a percentage-based maxUnavailable is the more robust choice.
The trap with minAvailable is that it is an absolute number that does not track replica changes. Set minAvailable: 2 on a three-replica service and it is correct; let that service autoscale up to twenty replicas and the same PDB now allows eighteen pods to be drained at once, far more disruption than you intended. The budget did not change, but its meaning did, because the replica count moved underneath it.
maxUnavailable: 25% avoids that drift: it always means “at most a quarter of whatever is currently running,” so it scales with the deployment. For a small, fixed-size critical service, minAvailable set to replicas minus one is clear and fine. For anything that autoscales, prefer the percentage-based maxUnavailable so the protection stays proportional as the service grows and shrinks. Either way, generate it from policy rather than hand-setting numbers that drift.
A PodDisruptionBudget checklist
For every important deployment:
- A PDB exists (it is part of being production-ready, not optional for real services).
minAvailable(ormaxUnavailable) keeps enough pods up to serve, while allowing at least one eviction.- It is never set equal to the replica count (that blocks drains forever).
- Single-replica services that need availability have been given more replicas, not just a PDB.
- Replicas are spread across nodes/zones with anti-affinity, so the PDB and the spread reinforce each other.
- You understand it covers voluntary disruptions only; crash survival comes from replicas + spread.
- You tested a node drain in staging and confirmed it proceeds without taking the service down.
How do PDBs interact with the cluster autoscaler?
PodDisruptionBudgets directly shape how the cluster autoscaler removes nodes. When the autoscaler wants to scale down by removing an underused node, it must evict that node’s pods, and it will respect your PDBs while doing so. A PDB that cannot be satisfied can therefore block scale-down, leaving you paying for a node the autoscaler wanted to reclaim.
This is the same blocked-drain dynamic as a manual node drain, just triggered automatically. If a deployment’s PDB does not allow any eviction, the autoscaler cannot drain the node hosting its pods, so that node stays up indefinitely and your cluster does not shrink when it should. The symptom is a cluster that never scales down even when utilization is low, and the cause is often a too-strict PDB somewhere.
The resolution is the same discipline as everywhere else with PDBs: set budgets that permit at least one eviction at a time, spread replicas across nodes so evicting one node never threatens the whole service, and avoid single-replica deployments with availability-requiring PDBs. Get this right and node scale-down and PDBs cooperate: the autoscaler reclaims nodes by draining them a pod at a time, your service stays up throughout, and you stop paying for capacity you are not using.
What I’d do differently
The mistake I have seen most is treating PDBs as a box to tick, copying minAvailable: 1 onto everything, which on a two-replica service means both pods can be evicted down to one but a single-replica service quietly becomes undrainable. The value is not boilerplate; it has to reflect the actual replica count and availability needs of each service.
If I were standardizing this, I would generate the PDB from the deployment’s replica count by policy (for example, always replicas − 1 or a sensible maxUnavailable percentage) rather than hand-setting a number that drifts out of sync when replicas change. And I would test a node drain as part of validating any new service, because a blocked drain discovered during a real cluster upgrade is a far worse time to learn that the PDB was wrong. PDBs are small, and like most small Kubernetes objects, they punish carelessness precisely when you are mid-maintenance.
Sources
- Kubernetes, Specifying a Disruption Budget: kubernetes.io/docs/tasks/run-application/configure-pdb
- Kubernetes, Disruptions (voluntary vs involuntary): kubernetes.io/docs/concepts/workloads/pods/disruptions
- Kubernetes, Safely drain a node: kubernetes.io/docs/tasks/administer-cluster/safely-drain-node
Frequently asked questions
What is a PodDisruptionBudget?
A PodDisruptionBudget (PDB) is a Kubernetes object that limits how many pods of an application can be voluntarily disrupted at once, such as during a node drain or cluster upgrade. It tells Kubernetes the minimum number (or maximum unavailable) of pods that must stay running, so a routine maintenance operation cannot take your whole service down.
What does a PodDisruptionBudget actually protect against?
Voluntary disruptions: node drains, cluster upgrades, and other operations that evict pods on purpose. It does not protect against involuntary disruptions like a node crashing or running out of memory. PDBs make planned maintenance safe; they are not a defense against hardware failure.
How do you set minAvailable on a PodDisruptionBudget?
Set it to the number of pods your service genuinely needs to stay healthy, leaving room for at least one pod to be evicted at a time so drains can proceed. With three replicas, minAvailable 2 is common. Setting it equal to the replica count blocks drains entirely; setting it too low protects nothing.
Why is my node drain stuck or blocked?
Often a PodDisruptionBudget that cannot be satisfied. If minAvailable equals the replica count, or a single-replica deployment has a PDB requiring that one pod stay up, the drain can never evict the pod without violating the budget, so it blocks forever. Fix the PDB or add replicas.