Running stateful workloads in Kubernetes sounds great …until you actually try it.
From managing persistent volumes and failover to figuring out how to safely scale your database, the challenges start to pile up fast. And if your team is operating across both VMs and containers, things get even more complex.
In this post, we’ll walk you through:
- Why stateful workloads are hard in Kubernetes
- What the native tools offer (and where they fall short)
- Common mistakes platform teams make
- How modern orchestration tools like open-source Klutch can simplify the process whether you’re running databases on VMs, in Kubernetes, or both
What makes stateful workloads so hard in Kubernetes?
Kubernetes was designed for stateless workloads. Web apps, microservices, and APIs that can be spun up, scaled out, or torn down without much worry? Perfect fit.
Databases? Message queues? Caches and file stores?
Those are stateful workloads. Within stateful workloads persistent data, stable network identity, and guaranteed ordering often matter. And that makes Kubernetes orchestration a lot trickier.
Here’s why:
- Persistent storage is complicated – You need to ensure storage survives pod restarts, rescheduling, and scaling events. Getting the right storage class, access mode, and retention policy across clusters is no small feat.
- Data consistency isn’t a given – If multiple pods access the same volume or replication isn’t handled properly, you risk data corruption or service downtime.
- High availability and failover are non-trivial – Manual recovery processes or improperly tuned operators can lead to minutes (or hours) of downtime if a node goes down.
- You can’t just “scale it” like a stateless app – Databases often require careful vertical scaling or cluster-aware sharding …something Kubernetes doesn’t manage for you out of the box.
What Kubernetes gives you (and what it doesn’t)
Kubernetes does offer native tools to help run stateful services:
StatefulSets
Maintain sticky identities for pods and predictable persistent volume claims (PVCs). Great for stateful pods, but you still need to manually handle upgrades, backups, and failover.
Persistent Volumes and Storage Classes
Allow pods to connect to long-lived storage backends. But provisioning and choosing the right access modes and reclaim policies can get complex quickly.
Operators
Custom controllers that automate lifecycle tasks like backups or failover. Powerful, but often limited to a specific technology or deployment model, and hard to scale across teams.
Despite these tools, many platform teams still struggle with:
- Creating consistent database environments
- Managing mixed VM + Kubernetes workloads
- Providing self-service without sacrificing control
That’s where orchestration layers come in.
Why most teams still struggle with stateful services
Even with the right primitives in place, teams hit blockers:
- Ops overload: Infra teams are stuck provisioning, patching, and restoring databases manually, often across environments.
- Lack of consistency: Different teams use different tools, versions, and patterns for deploying stateful services.
- Developer frustration: It’s not clear how to request a database or caching service or a message broker, where to find logs, or how to get observability.
- Compliance headaches: Managing backups, failover regions, and data retention policies across clusters is complex and risky.
Stateful workloads need more than pods and PVCs; they need orchestration.
Modern approaches to stateful orchestration
To address this, teams have started building or adopting solutions that offer:
- Declarative database provisioning (e.g. PostgreSQLInstance manifests)
- Predefined lifecycle automation (e.g. backups, restores, failovers)
- Standardized service discovery and credential management
- Integration with secrets managers, monitoring stacks, and CI/CD
You can piece this together using:
- Operators (like a8s PostgreSQL, Crunchy Postgres, or OpenSearch Operator)
- Helm charts and CI/CD automation
- Infrastructure-as-code tools like Terraform
But managing this at scale (and especially in multi-cluster or hybrid environments!) is painful.
That’s where tools like Klutch come in.
How Klutch simplifies stateful service orchestration
Klutch is an open-source control plane for data services that lets you:
Standardize orchestration across environments
Run databases on VMs or in Kubernetes pods. Klutch abstracts the platform layer and exposes a consistent interface.
Automate complex lifecycle tasks
Provision new databases, scale instances, restore from backups, or rotate credentials all declaratively.
Integrate with your platform stack
Plug into your GitOps workflows, secrets manager, observability tools, and platform APIs. No more one-off scripts.
Enable self-service (with guardrails via CRDs)
Developers can request services without ops intervention. Operators retain full control over versions, regions, and policies.
Whether you’re migrating legacy workloads to Kubernetes or building a platform that supports both VMs and containers, Klutch helps you avoid snowflake infrastructure.
TL;DR: it’s time to tame your stateful workloads
Kubernetes is great for stateless apps, but managing databases and other stateful services takes more than PVCs and hope.
Quick recap:
- Stateful workloads introduce real orchestration challenges
- Kubernetes provides some building blocks, but not a full solution
- Operators and Helm can help, but don’t scale well across teams or environments
- Tools like Klutch provide consistent, automated lifecycle management for data services on VMs or K8s
Ready to simplify how your team handles stateful services?
Explore Klutch
Or check out the Klutch GitHub project to get started today.
