Amazon EKS Auto Mode Adds a Managed Data Plane to Kubernetes

Nana JanashiaAWS DevelopersThursday, May 7, 20265 min read

Alex Kestner argues that Amazon EKS Auto Mode extends EKS management from the Kubernetes control plane into the data plane where workloads run, infrastructure is provisioned, and scaling and networking decisions are made. He presents the service as a way for teams to create what AWS calls a production-grade cluster through a console click or API call, while retaining Kubernetes-native controls over instance types, node pools, storage, and networking. Nana Janashia frames the central concern as whether automation reduces control; Kestner’s answer is that Auto Mode is opinionated, not closed.

EKS Auto Mode shifts the managed boundary from control plane to data plane

Alex Kestner says the hardest part of getting started with Kubernetes is often not Kubernetes’ power, but the number of operational decisions required before a team can deploy an application. Nana Janashia describes bare-bones Kubernetes as fully working and useful; Kestner agrees it is “absolutely useful.” But that usefulness still leaves teams to choose and configure observability, ingress, security, permissions, and networking.

Kestner frames Amazon EKS Auto Mode as AWS moving beyond the original EKS managed-control-plane model. EKS already handled the Kubernetes control plane: the required, undifferentiated part of running a cluster. Customers, he says, kept running into the harder surrounding work: where workloads actually run, how they use EC2 instance types, and how networking lets workloads communicate with each other and with end users.

Auto Mode adds a managed data plane. Kestner defines that as the part of the cluster where applications run, including both the in-cluster software needed to integrate with AWS services and other essentials, and the EC2 infrastructure that hosts workloads. AWS’ position is that both pieces can be operated on behalf of customers, so teams can deploy applications rather than operate Kubernetes itself.

Kestner says Auto Mode was intentionally designed so a team can use the AWS console or EKS APIs “with a single API call or a click in the console” to create what AWS calls a production-grade cluster, without first touching half a dozen other AWS services.

The product is opinionated, but not closed

Nana Janashia raises the obvious objection: some engineering teams will see automation and worry about losing control, visibility, or flexibility. Alex Kestner answers that those teams are still intended users.

The design goal, he says, is a fast default path with the ability to override AWS’ opinions when real-world use cases require it. The customization surface he describes is Kubernetes-native: standard Kubernetes APIs and custom resources. Teams can configure which instance types they want, whether to use Spot, Graviton, or GPU-accelerated instances, and how storage should be configured for stateful workloads.

Auto-scaling is the clearest example of that balance. Auto Mode is built using the open-source Karpenter project. Kestner explains that Auto Mode clusters can include one or both of two preconfigured node pools. A node pool, in Karpenter terms, is a custom resource definition used to configure auto-scaling and compute infrastructure.

AWS supplies what Kestner calls a general-purpose node pool configuration, but users can create their own. That is where teams can fine-tune the scaling policy, cost optimization behavior, and the instance types they want to include or exclude. The promised bargain is not “no configuration ever”; it is that a usable production baseline exists before a team starts customizing.

Cost optimization is treated as a default behavior, not a month-two cleanup job

Alex Kestner says cost control is a common place where teams get surprised after the first cloud bill arrives. Auto Mode’s cost optimization is described as continuous and automatic rather than a task someone remembers to do periodically.

When a new workload enters the cluster and there is not enough capacity for it, Auto Mode evaluates the CPU, memory, and other requirements of that workload. Kestner says it then looks for the most cost-effective instance that can meet those requirements. In parallel, it continues looking for ways to reconfigure the cluster’s compute infrastructure to reduce cost.

The on-screen summary reduces that behavior to three phrases: dynamic scaling, workload consolidation, and no overprovisioning. Kestner’s explanation is narrower and more operational: Auto Mode tries to launch just enough capacity for pending work and keeps reconsidering the shape of the compute fleet in the background.

For small teams, Nana Janashia asks whether that means a startup can deploy on EKS Auto Mode without an operations person and focus on its application or AI model. Kestner says that is “absolutely” the vision. His argument is that AWS’ EKS team has operated tens of thousands of Kubernetes clusters over seven to eight years, and customers can delegate that operational burden rather than hire scarce Kubernetes specialists themselves.

AI workloads split into inference and training

Nana Janashia introduces the AI section by citing a statistic she found surprising: more than 80% of AI workloads run on Kubernetes.

Over 80%

of AI workloads cited by Janashia as running on Kubernetes

The discussion separates AI workloads into two categories: inference and model training. The distinction matters because the infrastructure pattern is different.

For inference, the emphasis is burstiness. Users arrive unpredictably, make requests against a model, and create demand spikes that do not follow a neat pattern. Auto Mode’s reactive scaling is described as the relevant mechanism: when activity spikes, it examines the pods that will serve inference requests and creates the needed capacity as quickly as possible.

Speed matters because the instances involved are expensive, often costing dollars per hour rather than cents. The discussion also points to a specific optimization launched the previous fall: Auto Mode can download large container images with each layer in parallel and unpack them in parallel as well. That is meant to reduce the time before a newly launched instance is ready to handle requests, which is especially important when images contain large models or data and can reach many tens of gigabytes.

Training workloads are described differently. They are closer to batch processing: a job needs a defined amount of capacity, and that capacity may already have been reserved or purchased in advance. For that case, Auto Mode can effectively turn off its usual auto-scaling behavior and keep a fixed number of specified instances provisioned so they are ready for the next training job.

Customer feedback is pulling Auto Mode toward more infrastructure customization

The later customer-feedback discussion returns to the same tension raised earlier by Nana Janashia: how to preserve a highly managed experience while still letting customers customize the environment. Early feedback after launch focused on giving customers more control over the infrastructure Auto Mode provides.

One example is networking for larger EKS clusters. Customers may want node traffic and pod traffic on separate subnets, either for security reasons or because of IP address availability. The discussion says separating those paths lets customers scale networking infrastructure independently and segment traffic into different logical groups.

AWS responded by adding support for that pattern, allowing customers to customize Auto Mode-launched infrastructure in ways they were used to from standard EKS, while retaining the managed behavior of Auto Mode. The discussion says this will remain an area of investment “for the foreseeable future,” because adding control is how Auto Mode can meet more use cases without abandoning its managed model.

Inference and Deployment AI Infrastructure and Compute