Containers
Introducing cdk8s+: Intent-driven APIs for Kubernetes objects
At AWS, we’ve been exploring new approaches of making it easier to define Kubernetes applications. Last month, we announced the alpha release of cdk8s, an open-source project that enables you to use general purpose programming languages to synthesize manifests.
Today, I would like to tell you about cdk8s+ (cdk8s-plus), which we believe is the natural next step for this project.
cdk8s+ is a library built on top of cdk8s. It is a rich, intent-based class library for using the core Kubernetes API. It includes hand crafted constructs that map to native Kubernetes objects, and expose a richer API with reduced complexity.
To give you an idea of what I mean, here is how you’d define a Deployment
and expose it on port 8000 via a Service
:
Notice how we didn’t have to configure any selectors, nor did we have to specify the internal port used by the container when exposing our deployment. The snippet above will generate the following YAML manifest:
Later on, we will dive deeper into the API and the considerations we made building it.
cdk8s+ is in alpha
Note: the library is in very early stages of development. As such, it may be lacking substantial features as well as introduce breaking changes between updates. Use it with care and at your own discretion.
All breaking and non breaking changes will be published in the CHANGELOG.
Getting started
Head over to our GitHub repo and try it out. You’ll find documentation for all the available constructs, as well as a full API spec. We would love to hear what you think is missing, and if you so choose, actively participate in the development.
The library is currently available for Typescript and Python, with more languages coming soon. Also note that the generated manifests are completely agnostic to the cloud provider you are using. It produces pure Kuberenetes files that can be applied to any cluster.
Diving deep
This blog will show you how to deploy a real world Kubernetes application using cdk8s+. In order to get a complete picture of its benefits, we will actually develop the application in 3 different ways; First by directly authoring a YAML manifest, then by using a programming language (Typescript) and cdk8s to generate a manifest, and finally we will use the rich API provided by cdk8s+.
Before we start, let’s introduce a few guiding principles that will help us navigate through the different approaches.
- Desired State: We’d like our application definition to be solely based on a desired state configuration. This is necessary in order to apply infrastructure-as-code best practices, as well as enable GitOps workflows.
- Don’t Repeat Yourself (DRY): Avoid having to repeat any value or definition in multiple locations. This makes our desired state much less sensitive to change.
- Boilerplate: Information that can be inferred, should be inferred. Having to repeatedly apply common configurations makes our application overly complex and more error prone.
- Cognitive Load: Ideally, we should be able to write our application without exactly remembering how to configure each resource. We want the tools to guide us.
- Reusability: Once our application is done, we’d like for it to be easy to share our work with others.
We will go back to these guidelines throughout this post, and see how each approach addresses them.
Okay, we are now ready to get started. First, lets describe our application:
Construct Catalog Search
Those of your familiar with the constructs Ecosystem, might have already encountered awscdk.io. It’s a website for discovering constructs and is maintained as an open-source project at https://github.com/construct-catalog/catalog. Today, the catalog simply posts a tweet every time a new CDK construct is published. It then uses Twitter itself as somewhat of a search engine.
If you’re looking for information on how to publish your own construct library, check out Publishing Modules.
We’d like the catalog to provide a “real” search experience with filtering and aggregation capabilities. To do that, every time a new library is published, we are going to index an event to an Elasticsearch cluster, using an Amazon SQS queue in the middle. In addition, we will expose an endpoint that will perform a free text ES query.
So, our application has two components:
- Query Server: An http server accepting requests and performing Elasticsearch queries. (query.js)
- Indexer Worker: A long running poller process that fetches messages from the queue and indexes them to Elasticsearch. (indexer.js)
As inputs, our app will accept a QUEUE_URL
and an ELASTICSEARCH_ENDPOINT
env variable.
Note that the Elasticsearch cluster is actually created with cdk8s as well using a CRD. The code is available here: elasticsearch.ts.
Assuming the application code has already been written, we now want to deploy it to a Kubernetes cluster.
Construct Catalog Search: Using YAML
Like we mentioned, we will first write pure k8s YAML.
To get my application code inside a container, I will try and embed it inside a ConfigMap,
and later configure my pod to use that ConfigMap
As you can see, I’ve hit my first snag: how do I get my code from query.js to the manifest file?
Kubectl to the rescue
kubectl
has native support for creating ConfigMap
data from files:
❯ kubectl create configmap query-config-map --from-file=./query.js configmap/query-config-map created
The next step is to create a Deployment
that deploys our query server.
So far, I defined a Deployment
with a replica count of 3 and specified a pod template. My pods will include a ConfigMap
based Volume
, that will be mounted to /root
.
Let’s apply it to the cluster and see what happens.
❯ kubectl apply -f manifest.yaml error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec;
Whoops, of course, I forgot to apply selectors so that the deployment will be able to find its pods.
This does beg the question: how do I make sure to keep the selectors in sync with the labels? Also, since the pod spec is defined in the scope of a deployment, it makes sense for the deployment to simply select all the pods it created.
Ok, lets add selectors and re-apply:
❯ kubectl apply -f manifest.yaml [14:37:33] deployment.apps/query-deployment created
Looks okay, lets check out our pods:
❯ kubectl get -A pods NAMESPACE NAME READY STATUS RESTARTS AGE default query-deployment-6576f6f795-4kttz 0/1 ContainerCreating 0 2m3s default query-deployment-6576f6f795-dqmwd 0/1 ContainerCreating 0 2m3s default query-deployment-6576f6f795-f2lqr 0/1 ContainerCreating 0 2m3s
We see 3 pods indeed, but for some reason they have been in ContainerCreating
status for a long time. Let’s inspect one of them:
❯ kubectl describe pod query-deployment-6576f6f795-4kttz .... .... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 61s default-scheduler Successfully assigned default/query-deployment-6576f6f795-4kttz to kind-control-plane Warning FailedMount 29s (x7 over 61s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "query-app-volume" : configmap "query-configmap" not found
Uh oh, did you spot the typo? We used query-configmap
instead of query-config-map
as the ConfigMap
name.
This begs another question: since my
ConfigMap
is created out of band, how do I keep these values in sync?
Okay, let’s fix that and reapply:
❯ kubectl apply -f manifest.yaml && sleep 10 && kubectl get -A pods [14:59:36] deployment.apps/query-deployment configured NAMESPACE NAME READY STATUS RESTARTS AGE default query-deployment-6d95544db6-57rz2 1/1 Running 0 18s default query-deployment-6d95544db6-bz4qx 1/1 Running 0 15s default query-deployment-6d95544db6-qf6br 1/1 Running 0 16s
Cool, all seems to be in order!
The final step is to expose the pods as a service, so that they can be queried through a single network address. For the sake of simplicity, I’ll use a ClusterIP
service, which is also the Kubernetes default.
One more question: How do I make sure the selector the service uses matches the label of the pods? Same thing for the target port, it has to be the same as the container port.
❯ kubectl apply -f manifest.yaml [15:05:16] service/query-service created deployment.apps/query-deployment unchanged
If I now port-forward 8000 on my machine, I should get a response from the pods.
Great, the query application is working. All that’s left to do is add the indexer. The indexer specification is basically the same as the query, except it is not exposed by a service. Eventually, we end up with this full manifest file:
Also, we have two auxiliary commands we need to run before applying this manifest:
❯ kubectl create configmap query-config-map --from-file=./query.js ❯ kubectl create configmap indexer-config-map --from-file=./indexer.js
Let’s recap, and specifically, focus on our guiding principles:
- ❌ Desired State: Unfortunately, we were unable to define our application solely using a desired state YAML manifest. We had to resort to external imperative
kubectl
commands. - ❌ Don’t Repeat Yourself (DRY): We’ve seen several occurrences of having to duplicate and match values across multiple locations in the manifest.
- ❌ Boilerplate: We have to explicitly apply selectors to the pod template and the deployment, even though the template is configured in the scope of the deployment, and can implicitly infer the selection labels. The same goes for pod spec volumes.
- ❌ Cognitive Load: Even a simple application such as ours, required us to have rather extensive Kubernetes skills. We had to know what selectors are, how to create config maps with
kubectl
, and how to mount them as volumes onto the container. - ❌ Reusability: Since deploying our app involves running some custom
kubectl
commands, sharing it with others becomes tricky. We need to come up with a non-standard packaging and distribution mechanism. Also, our application cannot accept any configuration values, since we don’t have the ability to dynamically generate a manifest.
Construct Catalog Search: Using cdk8s
Next up, we explore the possibility of authoring a manifest file using a general purpose programming language. This is enabled by cdk8s, and truly opens the world of programming languages to infrastructure definitions. We already know exactly what we need to do, so let’s write down the entire application:
The API itself is basically a mirror of the YAML definition, but since we are now writing code, let’s see where we stand with our guiding principles:
- ✅ Desired State: We no longer need any external
kubectl
commands. Getting our application code into the manifest is done by simply usingfs.readFileSync.
- ✅ Don’t Repeat Yourself (DRY): Any duplicate value is defined once, as a constant, and is reused when needed.
- ❌ Boilerplate: Unfortunately, we still need to remember to apply selectors and configure pod spec volumes, even though this information can be inferred.
- ❌ Cognitive Load: We haven’t solved this problem. We still require the same set of Kubernetes skills to write this application.
- ✅ Reusability: We have two options here. 1) Publish a self-contained YAML manifest generated by running
cdk8s synth
. 2) Publish an NPM library that may or may not accept configuration values, and delegate the manifest generation to our users. Both ways are standard and simple.
In our next approach, I’ll show you how to address the two remaining principles through an approach called Intent-driven Design. By focusing on user intent, rather than on system mechanics, we can perform many operations on the user’s behalf, thus greatly reducing cognitive load and boilerplate definitions.
Construct Catalog Search: Using cdk8s+
Just like before, I’ll start by creating a ConfigMap
that will contain our source code.
A quick look at the API the kplus.ConfigMap
construct offers, reveals the addFile()
method. This conveys our intent of embedding a file in a ConfigMap
. It essentially simulates the external kubectl
command we used before.
Let’s use it:
queryConfigMap.addFile(`${__dirname}/query.js`);
I now need to create a Volume
from that ConfigMap
. So I use the Volume.fromConfigMap()
function:
All I need to do now is create the container, and use its mount()
method.
Next up, I’ll create a Deployment
that will deploy 3 instances of this container. Just like before, we create a deployment:
But this Deployment
is a bit different from its cdk8s
counterpart. It’s based on user intent, to understand what this means, let’s look at an excerpt from the manifest that cdk8s+ will synthesize for this deployment:
You can see that the cdk8s.deployment
selection label was automatically added. This is the Deployment
construct interpreting our intent, which is that we want this deployment to create and control pods defined by the template
property.
We now want to expose these 3 pods (i.e the deployment) through a single network address. The Deployment
construct offers an API to do just that:
Again, notice what we didn’t have to do:
- We didn’t have to specify any selectors.
- We didn’t have to specify the container port.
Internally, this method will create a Service
of type ClusterIP
, and apply the correct selectors and ports. This is possible because the deployment already has all this information, and cdk8s+ implicitly uses it on my behalf. If I add the indexer deployment, the full cdk8s+ application definition looks like so:
Going back to our guiding principles now, lets see where we’re at:
- ✅ Desired State: Nothing has changed here, we still don’t need any external
kubectl
commands. And in-fact, we don’t even need to explicitly usereadFileSync
, since cdk8s+ will do that for us. - ✅ Don’t Repeat Yourself (DRY): Still good, our use of a programming language eliminates this issue.
- ✅ Boilerplate: This code embodies the minimal amount of configuration needed to correctly deploy our application. All redundant information, such as selectors and pod spec volumes, is implicitly inferred.
- ✅ Cognitive Load: We managed to greatly reduce the cognitive load since we were guided by intent based API’s. These API’s alleviate some of the skills needed to interact with Kubernetes resources.
- ✅ Reusability: Same as before, we either publish an NPM package or a generated YAML manifest (or both).
Summary
We started with a multitude of issues that arise from the limitations of YAML. We saw many of those issues disappear when we used cdk8s to rewrite our YAML definition in a programming language. We also saw that simply using a programming language was not enough, as it still carried a rather high cognitive load on the developer. We then started using the intent based APIs provided by cdk8s+ and saw much of that load go away. Here is a recap of how well each approach addressed our guiding principles:
Head over to our GitHub repo to try cdk8s+. We want to hear about as many use cases as possible and develop the library alongside the community. We also invite you to join the discussion on our Slack channel and on Twitter (#cdk8s#cdk8s+).
Happy coding!