An idle Kubernetes platform can be expensive without serving meaningful traffic. The dominant cost is not request volume. The dominant cost is the shape of the baseline: always-on nodes, attached volumes, a load balancer, and scheduler decisions that keep elastic capacity alive after it was created.
The reduction described here moved the cluster from a five-to-six-node operating shape back to the intended four-node baseline: one Talos control plane, three static Talos workers, and zero active autoscale workers. The important part was not deleting two servers. The important part was making sure the scheduler and GitOps state no longer had a reason to create them again.
Starting State
The platform is a Talos Linux Kubernetes cluster in Hetzner Cloud, deployed in Ashburn. The stable baseline is:
-
1 x cpx31control plane, running Talos and Kubernetes control plane components. -
3 x cpx21static workers for normal platform and application workloads. -
1 x lb11Hetzner load balancer for ingress. -
260 GBof persistent volumes across registry, database, monitoring, storage, and application state. -
A cluster-autoscaler node pool named
worker-autoscaleconfigured withmin=0andmax=5.
flowchart LR
subgraph external["External edge"]
cloudflare["Cloudflare DNS / Access"]
lb["Hetzner LB11"]
end
subgraph cluster["talos-redux Kubernetes cluster"]
cp["Control plane\n1 x CPX31"]
subgraph static["Static worker baseline"]
w1["worker-1\nCPX21"]
w2["worker-2\nCPX21"]
w3["worker-3\nCPX21"]
end
subgraph elastic["Elastic worker pool"]
autoscale["worker-autoscale\nmin 0 / max 5"]
end
argocd["ArgoCD"]
registry["Private registry"]
observability["Prometheus / Grafana / Loki"]
apps["Application workloads"]
end
cloudflare --> lb
lb --> w1
lb --> w2
lb --> w3
argocd --> apps
registry --> apps
autoscale -. "burst jobs only" .-> apps The live cluster had drifted from that target. Two autoscale workers were running for long periods, and a third was created during remediation when request pressure briefly made a baseline pod unschedulable. The expensive state was therefore not just "autoscaler exists"; it was "baseline workloads are eligible to land on autoscaler capacity, and some requests are large enough that the scheduler asks for more nodes."
| 1 | Before: |
| 2 | control plane: 1 x cpx31 |
| 3 | static workers: 3 x cpx21 |
| 4 | autoscale workers: 2 x cpx21, long-lived |
| 5 | load balancer: 1 x lb11 |
| 6 | volumes: 260 GB |
| 7 | result: about $108-$110/month by hcloud API pricing |
| 8 | |
| 9 | After: |
| 10 | control plane: 1 x cpx31 |
| 11 | static workers: 3 x cpx21 |
| 12 | autoscale workers: 0 x cpx21 |
| 13 | load balancer: 1 x lb11 |
| 14 | volumes: 260 GB |
| 15 | result: about $84-$86/month by hcloud API pricing |
The hcloud API reported Ashburn monthly prices of
$11.99 for
cpx21,
$20.99 for
cpx31, and
$7.49 for
lb11. Removing two long-lived
cpx21 autoscale nodes removed
about $23.98/month of compute
spend. The remaining $84-$86/month
is baseline infrastructure: four servers, one load balancer, persistent
volumes, and small snapshot overhead.
Why The Bill Was High
The system had no meaningful traffic, but Kubernetes cost is not strictly traffic-correlated. A cluster accrues cost when infrastructure exists, not when packets flow. For this cluster, the recurring spend came from four categories:
- Compute baseline: the control plane and three static workers are always on.
- Elastic compute leakage: autoscale nodes were created for scheduling pressure and then remained useful to baseline workloads.
- Persistent storage: volumes remain attached and billed regardless of HTTP traffic.
- Network edge: the load balancer is fixed monthly infrastructure.
The key operational distinction is between utilization and requests. Kubernetes scheduling is driven by declared requests, not by live usage. A pod using 40 MiB can reserve 512 MiB. A node using 2.4 GiB can still be considered full if requested memory is near allocatable memory. The scheduler has to be conservative because requests are the contract a pod makes with the cluster.
flowchart TD
pending["Pod is Pending"] --> scheduler["Kubernetes scheduler"]
scheduler --> requests["Evaluate requests, node selectors, affinities, taints, PV topology"]
requests --> fit{"Fits static workers?"}
fit -->|"yes"| static["Schedule on nodepool=worker"]
fit -->|"no"| autoscaler["cluster-autoscaler evaluates scale-up"]
autoscaler --> allowed{"Pod can run on autoscale pool?"}
allowed -->|"yes"| newnode["Create CPX21 autoscale worker"]
allowed -->|"no"| remain["Remain Pending"]
newnode --> cost["Idle server can persist if baseline workload lands there"] That made resource requests the main cost-control surface. CPU and memory requests were not only performance declarations; they were the inputs that decided whether the scheduler could keep the platform inside the static worker baseline.
GitOps Drift
The first hard blocker was ArgoCD itself. The repository-level Terraform
values set a global node selector for ArgoCD components:
nodepool=worker. The live
Helm release had a more specific controller override:
controller.nodeSelector.nodepool=worker-autoscale.
That component-specific value won over the global value.
When an autoscale node was drained, the ArgoCD application controller was
evicted and could not schedule on the static worker pool because its
live StatefulSet still demanded
worker-autoscale. The durable
fix was to make the controller-specific value explicit in Terraform, not
only patch the live StatefulSet.
| 1 | controller = { |
| 2 | replicas = 1 |
| 3 | nodeSelector = { |
| 4 | nodepool = "worker" |
| 5 | } |
| 6 | resources = { |
| 7 | requests = { |
| 8 | cpu = "100m" |
| 9 | memory = "256Mi" |
| 10 | } |
| 11 | limits = { |
| 12 | memory = "2Gi" |
| 13 | } |
| 14 | } |
| 15 | } |
This is the main GitOps lesson: a live patch is only a repair if the source of truth agrees with it. Otherwise the reconciler is doing its job when it reverts the patch.
Request Pressure
After ArgoCD was moved back to the static worker pool, the next problem was pure scheduler pressure. The static worker nodes were close to full by requested memory even though live memory usage was lower. Before trimming, two workers were around 96 percent requested memory and the third was above 80 percent. That left too little room for rescheduling after draining autoscale capacity.
The remediation lowered requests for idle or over-reserved services while leaving limits high enough for bursts. This changed scheduling economics without removing the ability for workloads to use more memory when the kernel and kubelet allow it.
| 1 | Workload Before After |
| 2 | ArgoCD application controller 512Mi request 256Mi request |
| 3 | hosted Hermes web agent 768Mi request 384Mi request |
| 4 | Hermes agent gateway 512Mi request 128Mi request |
| 5 | BigCartBuddy OCR service 512Mi request 128Mi request |
| 6 | Kubernetes Dashboard 800Mi total scaled to zero |
| 7 | Hermes baseline affinities worker, autoscale worker only |
The Kubernetes Dashboard was scaled to zero because it is an idle admin surface and was reserving approximately 800 MiB across its pods. The dashboard can be restored when needed, but it does not need to participate in the idle baseline.
The Hermes baseline deployments were also changed to remove
worker-autoscale from their
required node affinity. An always-on service should not match the elastic
pool. The elastic pool should serve short-lived jobs, burst runners,
batch workloads, and other capacity that can disappear without keeping
the platform alive.
| 1 | affinity: |
| 2 | nodeAffinity: |
| 3 | requiredDuringSchedulingIgnoredDuringExecution: |
| 4 | nodeSelectorTerms: |
| 5 | - matchExpressions: |
| 6 | - key: nodepool |
| 7 | operator: In |
| 8 | values: |
| 9 | - worker |
The Worker Failure
During the reduction, one static worker stopped posting kubelet status
and moved to NotReady. The
registry pod was stuck in
ContainerCreating, private
image pulls returned temporary 503s from the registry endpoint, and
kubelet log requests to that worker timed out.
The sequence mattered:
- The node was cordoned so new work would not be scheduled onto a weak worker.
- A provider-level reboot was attempted first.
- The node remained unhealthy, so a provider hard reset was used.
- Kubernetes reported the node Ready after the reset.
- The node was uncordoned only after kubelet had re-registered.
- The registry, ArgoCD controller, and private-image application rollouts were allowed to complete before continuing autoscale deletion.
That avoided compounding the cost reduction with an availability incident.
Private registries are dependency concentrators. If the registry is down,
every new pod that needs a private image is vulnerable to
ImagePullBackOff. The safe
order was registry recovery first, autoscale deletion second.
Drain, Delete, Verify
Autoscale nodes were drained before deletion. Draining gives controllers a chance to recreate pods on valid nodes, forces volume detach/attach flows to complete, and reveals unexpected placement constraints before the cloud server disappears.
| 1 | kubectl drain talos-redux-worker-autoscale-5c851c3b70f45cfd \ |
| 2 | --ignore-daemonsets \ |
| 3 | --delete-emptydir-data \ |
| 4 | --timeout=15m |
| 5 | |
| 6 | kubectl drain talos-redux-worker-autoscale-6dd552a4eaa30c52 \ |
| 7 | --ignore-daemonsets \ |
| 8 | --delete-emptydir-data \ |
| 9 | --timeout=10m |
| 10 | |
| 11 | kubectl delete node \ |
| 12 | talos-redux-worker-autoscale-5c851c3b70f45cfd \ |
| 13 | talos-redux-worker-autoscale-6dd552a4eaa30c52 |
| 14 | |
| 15 | hcloud server delete \ |
| 16 | talos-redux-worker-autoscale-5c851c3b70f45cfd \ |
| 17 | talos-redux-worker-autoscale-6dd552a4eaa30c52 |
Deleting the Kubernetes node object and deleting the Hetzner server are both required. The Kubernetes node object removes cluster membership. The cloud server deletion removes the billable resource.
Verification was deliberately delayed. A cluster can look clean immediately after a drain and still recreate autoscale capacity a minute later if one pod remains unschedulable. The final checks were run after a short wait to confirm that the autoscaler did not create another node.
| 1 | kubectl get nodes -o wide |
| 2 | kubectl get pods -A --field-selector=status.phase=Pending -o wide |
| 3 | kubectl get pods -A -o wide | rg 'ImagePull|ContainerCreating|Pending|CrashLoop|Unknown|Terminating|Error' |
| 4 | kubectl -n argocd get applications |
| 5 | hcloud server list |
| 6 | kubectl top nodes |
The final verification state was:
- Four Kubernetes nodes: one control plane and three static workers.
- Four Hetzner servers: the same one control plane and three workers.
- Zero active autoscale servers.
- No Pending pods.
-
No matching pod errors for
ImagePull,ContainerCreating,CrashLoop,Unknown, orTerminating. -
Every ArgoCD Application reported
SyncedandHealthy.
The post-reduction node utilization check showed the tighter baseline: the control plane around 66 percent memory, worker-1 around 65 percent, worker-2 around 95 percent, and worker-3 around 78 percent. That is an acceptable idle/prepared-state posture, but it is not excess capacity. Real traffic or larger jobs should use autoscale again.
What Changed In Git
The durable changes were committed in the platform repository and in the application repository that owns BigCartBuddy's generated manifests.
-
talos-redux:3a63368 chore: keep baseline workloads on static workers -
bigcartbuddy:32f620a chore: lower idle OCR resource requests
The platform commit did three things:
- Made ArgoCD application-controller placement and requests explicit in Terraform.
- Reduced hosted Hermes web-agent requests.
- Removed autoscale eligibility from the Hermes baseline services.
The application commit reduced BigCartBuddy OCR's idle request from a conservative boot-time reserve to a smaller steady-state reserve. The memory limit stayed high so OCR can still burst when needed.
Operating Model
The target operating model is a static baseline plus explicit elastic capacity:
- Static worker pool: ArgoCD, ingress, registry, databases, monitoring, and low-traffic application frontends.
- Elastic worker pool: build runners, batch jobs, security scans, temporary compute, and burst traffic that can tolerate node creation latency.
- GitOps source of truth: placement and requests must live in Git, not only in live patches.
- Cost alerts: autoscale nodes should be treated as temporary capacity. A long-lived autoscale node is either real demand or a placement/request bug.
- Security posture: Trivy, RBAC/config scanning, security-posture jobs, image updater controls, and cluster-health reporting remain part of the platform baseline.
The cluster autoscaler remains useful. The reduction did not remove autoscaling. It removed accidental baseline dependence on autoscaling. The next version of the platform should make that distinction impossible to miss.
What This Moves Towards
The direction is a platform that can sit idle cheaply, receive traffic safely, and scale only when there is a concrete reason to scale.
That requires a few explicit rules:
- Baseline services must not match the autoscale node pool.
- Resource requests should describe steady-state need, while limits should describe tolerated burst.
- Autoscale nodes should be observable as an event, not accepted as background noise.
- Stateful workloads need placement contracts that avoid unnecessary volume churn.
- Administrative surfaces that are not needed continuously should be scaled to zero or gated behind an operational runbook.
- Reconciliation state should be checked after live repairs so drift does not silently return.
The end state is not "minimum spend at all costs." The end state is
controlled spend: a known idle floor, explicit burst capacity, and enough
telemetry to explain any increase. In the current state, the known idle
floor is about $84-$86/month.
Any autoscale node above that floor should be explainable by a workload,
a release, a security scan, or a capacity policy.