Enhancing Spinnaker deployment for dynamic AWS account registration

This post was written by Manabu McCloskey, Gaurav Dhamija, Nima Kaviani, Siddhi Shah, Kevin Kidd, Brandon Leach, and Shrirang Moghe.

Multi-account Amazon Web Services (AWS) environments are a recommended best practice through which AWS customers can have clear separation of concerns across teams and applications where rapid innovation, flexible security controls, and varied adoption of business processes are required.

As AWS customer workloads grow in size, multi-account AWS environments become a core part of their enablement process for teams across the organization. This growth, however, means that delivery engineering teams must adapt to an increasing number of requests for additions, deletions, and updates to the different AWS accounts. These demands, coupled with how quickly and effectively the changes can be realized by the underlying CI/CD systems used to roll out software, define the speed at which an organization can innovate and deliver results.

For AWS customers using Spinnaker to continuously deploy their software systems, the process of updating AWS account information turned out to be a complex and time-consuming task. The process involved updating AWS account information in Spinnaker configuration files and then manually restarting the Spinnaker components that use these configuration files. In this article, we’ll describe how we designed and built a new interface and open source plugin for Spinnaker to dynamically manage a large number of AWS accounts.

Overview

Depending on the number of AWS accounts in a Spinnaker deployment, the process of validating AWS cloud resource information and updating system states could take up to 20 minutes. Multiplying that by the number of accounts that would potentially change in a given day or month would result in a significant amount of wasted engineering time across the organization.

In previous blog posts, we have highlighted enhancements made to Spinnaker by enabling deployment of Lambda functions via Spinnaker pipelines, and providing support for managed delivery in Spinnaker. In this blog post, we will highlight how a collaboration between AWS, Armory (an AWS strategic partner and one of the major contributors to Spinnaker), and Autodesk (an AWS customer heavily using Spinnaker) resulted in improving the process of adding, removing, and modifying cloud provider accounts in Spinnaker.

Through this effort, in addition to using configuration files, cloud provider accounts can be modified via third-party account sources (for example, web services managed by the organization but external to Spinnaker). They can also be modified dynamically and without having to stop or alter the behavior of Spinnaker and can be available in a matter of seconds to be used throughout the deployment process.

Here, we’ll highlight the design and implementation process that went into creating this feature in Spinnaker in addition to the installation and configuration process required for this capability to become available for a given Spinnaker deployment, particularly when using AWS accounts.

Design choices

The overarching goal of these efforts is to allow dynamic registration of cloud provider accounts within Spinnaker to meet the scalability requirements of AWS customers. These customers may have thousands of AWS accounts, which may be managed as part of an external configuration store outside the Spinnaker ecosystem.

In collaboration with Autodesk and Armory, we uncovered existing limitations to Spinnaker that can hinder productivity for AWS customers with a large number of AWS accounts. The scenario under which dynamic management of AWS accounts in Spinnaker works involves the following steps:

Customer has an automation pipeline that provisions AWS accounts as needed.
Customer has a mechanism to supply account information through a web service that is in charge of configuration management of the accounts.
The configuration management system is treated as the source of truth for supplying account information.

To address these limitations, we collaborated with Armory and Autodesk to evaluate three options.

Use Kubernetes secrets to synchronize account information

In this approach, Custom Resource Definitions (CRDs) are generated with references to Kubernetes secrets and placed in a Git repository. The Spinnaker operator ensures that secrets are mounted to each clouddriver pod. Within each pod, clouddriver is monitoring for updates to a file generated by secrets, then synchronizes accounts if changes are detected.

Here, the source of truth is the Git repository, and account change propagation is faster than restarting deployment. This approach, however, also adds complexity because it introduces moving pieces, and it relies on Kubernetes secrets. (Note that Kubernetes secrets are eventually updated, but timing is not guaranteed.)

Push/publish account updates

With this approach, the configuration management system responsible for managing AWS accounts pushes events, such as account creation and deletion, to a Spinnaker endpoint. Once Spinnaker receives an event, it updates the account information in a persistent storage layer, then all clouddriver instances are notified to reload accounts.

This approach is a simpler, but it requires Pub/Sub mechanisms in order to work. In the Spinnaker world, this means Redis must be used; however, not all Spinnaker installations use Redis. Some use SQL instead of Redis to store cloud infrastructure details. Additionally, this approach means that new endpoints in two microservices must be created.

Lazily load accounts

With this approach, a new interface and mechanisms for clouddriver to reload accounts are introduced to clouddriver. This interface defines mechanisms to poll the cloud provider credentials from an external source. It also allows for custom logic to execute when retrieving credentials.

For our use case, clouddriver periodically reaches out to the configuration management system to update its cloud provider accounts. Also, if a pipeline attempts to use an account that’s not yet been synchronized, clouddriver reaches out to the configuration management system to synchronize its account information.

Implementation

After a few weeks of discussion, we chose the third approach because it was abstract enough to allow company-specific business logic to be implemented without code changes in clouddriver. After the decision was made, an RFC to implement the new interface was introduced and accepted through a joint effort by Armory and AWS. Once the RFC was accepted, Armory implemented the new interface and other changes needed for the interface.

Once the new interface changes were merged, we made a series of pull requests (PRs) to update AWS and Amazon Elastic Container Service (Amazon ECS) credentials handling in clouddriver.

Next, we built a Spinnaker plugin that extends the clouddriver microservice with functionality to consume account information from the configuration management system and supply account details to Spinnaker’s AWS providers. This process involves:

Defining a schema of data exchange between plugin and the external configuration management system for it to interface with Spinnaker’s clouddriver.
Introducing interfaces in Spinnaker core to enable dynamic account registration, particularly to manage (Add/Update/Remove) accounts without having to restart any of Spinnaker microservices.
Allowing clouddriver to receive account credential information from multiple credential repositories.
Ensuring a highly available clouddriver setup for the solution to be resilient to failures and capable of addressing updates to account information within the clouddriver microservice.

The next section demonstrates the AWS account registration plugin that takes advantage of the new interface.

Using the interface

The Spinnaker account management plugin for AWS is open source and available on GitHub. Detailed information on how to set it up can be found in the repository.

The plugin requires a RESTful endpoint that it can query to retrieve account information, such as account number and features to enable/disable. We’ll use an example HTTP server available in the Git repository to demonstrate the functionality of this plugin.

Plugin in action

In clouddriver, cloud provider accounts can be queried by issue a GET call to the /credentials endpoint. In your clouddriver instance, you should see AWS accounts configured from your static configuration file.

# curl localhost:7002/credentials | jq
[
  {
    "accountId": "123",
    "accountType": "ecs",
    "challengeDestructiveActions": false,
    "cloudProvider": "ecs",
    "environment": "static-account1",
    "name": "static-account1-ecs",
    "permissions": {},
    "primaryAccount": false,
    "requiredGroupMembership": [],
    "type": "ecs"
  }
...
]

Adding accounts

When we make the example HTTP server return a JSON payload, as shown in the following code example, we are telling the plugin to add AWS and Amazon ECS accounts named dynamic-account1:

{
  "SpinnakerAccounts": [
    {
      "AccountId": "1234",
      "SpinnakerAccountName": "dynamic-account1",
      "Regions": [
        "us-west-2"
      ],
      "SpinnakerStatus": "ACTIVE",
      "SpinnakerAssumeRole": "role/spinnakerManaged",
      "SpinnakerProviders": [
          "ec2",
          "ecs"
      ],
...
}

When you issue a GET call to the /credentials endpoint, you will see new AWS and Amazon ECS accounts available to use.

# curl localhost:7002/credentials | jq
[
  {
    "accountId": "1234",
    "accountType": "ecs",
    "challengeDestructiveActions": false,
    "cloudProvider": "ecs",
    "environment": "dynamic-account1",
    "name": "dynamic-account1-ecs",
    "permissions": {},
    "primaryAccount": false,
    "requiredGroupMembership": [],
    "type": "ecs"
  },
  {
    "accountId": "1234",
    "accountType": "dynamic-account1",
    "challengeDestructiveActions": false,
    "cloudProvider": "aws",
    "environment": "dynamic-account1",
    "name": "dynamic-account1",
    "permissions": {},
    "primaryAccount": false,
    "requiredGroupMembership": [],
    "type": "aws"
  },
  ...
]

Account removal

Similarly, you can specify an account removed by specifying SUSPENDED in the SpinnakerStatus field.

{
  "SpinnakerAccounts": [
    {
      "AccountId": "1234",
      "SpinnakerAccountName": "dynamic-account1",
      "Regions": [
        "us-west-2"
      ],
      "SpinnakerStatus": "SUSPENDED",
...
}

Once you make the change and let the plugin sync, you should see that the accounts named dynamic-account1* are removed. You can also modify accounts by updating the SpinnakerProviders field. For example, you can add lambda to the field, and the plugin will enable AWS Lambda features for this AWS account.

Conclusion and future scope

In this blog post, we outlined a collaboration involving Autodesk, Armory, and AWS, through which we designed and built a new interface and a plugin for Spinnaker to dynamically manage a large number of AWS accounts (Amazon Elastic Compute Cloud (Amazon EC2), Amazon ECS, and Lambda) in Spinnaker without having to restart Spinnaker microservices.

This plugin will help customers to sync periodically with a configured remote host to update Spinnaker AWS and Amazon ECS accounts. It also will support on-demand account loading and AWS Identity and Access Management (IAM) authentication when used with API Gateway.

With this added functionality, AWS customers can automate the creation and registration of AWS accounts with Spinnaker to fully automate the launch of new regions for products. This functionality will help increase engineering efficiency and allow AWS customers to deliver changes (features and fixes) in a fast and safe manner.

Get involved

The Spinnaker project and the plugin described in this article are open source projects. We would love for you to join the community and start contributing. Join us on Slack and GitHub.

Brandon Leach

Brandon Leach is a software architect at Autodesk and a long time member of the Spinnaker community. His background includes building software delivery automation, data and identity platforms, and large-scale distributed systems engineering. His current focus areas are service quality, engineering productivity, and enabling Autodesk’s transformation to a platform company.

Shrirang Moghe

Shrirang Moghe spends most of his time in the Autodesk Cloud, making every effort to make Autodesk Cloud Products as available, reliable, and delightful to their customers, and even better than their desktop ancestors. He has spent more than 17 years designing CAD systems and workflows. Talk to him about regionalization, FedRamp, collaboration, cloud security, or your CAD/construction automation needs.