How CyberArk Implements Feature Flags with AWS AppConfig

Written by Ran Isenberg, Principal Architect at CyberArk

Feature flags are a powerful tool that allow you to change software behavior. In addition, feature flags can improve your CI/CD pipeline by enabling capabilities, such as A/B testing, thus making them an enabler of DevOps and a crucial part of any CI/CD pipeline. However, feature flagging can become complicated if not done correctly, so best practices and appropriate tools are required.

In this blog post, you will learn how and why CyberArk uses feature flags with AWS AppConfig and AWS Lambda Powertools, and best practices for working with feature flags. In a nutshell, AWS AppConfig enables CyberArk’s teams to move faster and deliver more value to their customers with improved confidence.

While the examples describe the implementation with Lambda functions, other SaaS implementations, such as Elastic Compute Cloud (Amazon EC2) and container-based solutions, can also use the proposed solution.

Business Challenges

CyberArk is the global leader in Identity Security. It was founded in 1999 and currently employs over 2700 people across the globe. CyberArk products have focused heavily on SaaS and cloud-native solutions in recent years.

CyberArk services develop new features to provide better value to its customers. CyberArk’s SaaS services use configuration to define their behavior in runtime. A common practice is to use environment variables or static configuration files that define the service behavior. Feature flags are a subset of service configuration. However, due to its static nature, a static configuration can change only by redeploying the service with the new configuration values. This process takes time to go through the service CI/CD pipeline, which reduces the company’s flexibility to make changes quickly. A faster and more dynamic approach is required.

The process of releasing and deploying new features to production introduces several challenges:

How to change service behavior without redeploying it
How to build an automatic feature flags release process.
How to increase confidence in feature quality and the feature release process.
How to roll back/quickly disable a feature with zero downtime in case of production errors.

The proposed solution will provide answers to these challenges, but first, let’s review the solution requirements.

CyberArk’s Feature Flags Solution Requirements

The selected feature flags solution is required to:

Provide an application programming interface (API) to get a feature flag by name.
Support feature A/B testing – enable features for some customers but disable them for others – provide a different experience for different customers and users (admin user vs. nonadmin etc.).
Maintain high performance even for a large number of feature flags.
Be fully managed, highly available, and supports auto-scaling.
Support canary deployments of new configurations with automatic rollback in case of errors.
Have FedRAMP High ATO certification.

Solution Overview

The solution overview can be broken down into four parts:

How to develop and write feature flags configurations.
How to store and deploy the feature flags.
How to evaluate feature flags in runtime.
Define feature flags best practices.

The feature flags solution uses JSON configuration files stored in AWS AppConfig, which are deployed with a dedicated CI/CD pipeline (as dynamic configuration) and evaluated by the services in runtime with the AWS Lambda Powertools feature flags utility.

AWS AppConfig supplies solution requirements 3 through 6 while AWS Lambda Powertools supplies requirements number 1 and 2.

The AWS Lambda Powertools feature flags utility can be used in other non-AWS Lambda functions-based services.

Figure 1. Diagram that explains communication between AWS Lambda, AppConfig, and CI/CD pipline.

How to develop and write feature flags

Before diving deep into the solution details, let’s first understand how to define and store the feature flags configuration.

Feature Flags change service behavior, and a naïve implementation of feature flags may look like this:

def my_func(): 
  feature_flag: bool = evaluate_feature_flag()
  if feature_flag:
    handle_new_feature_logic()
  else:
    handle_regular_logic()

When the feature_flag variable is evaluated to ‘True,’ the code will call handle_new_feature_logic() and run the new business logic. However, when it’s evaluated as ‘False,’ the regular business will run. Essentially, changing the feature flag evaluated value will change the service behavior.

Static Configuration is Not Good Enough

Changing the service behavior is reduced to changing the configuration.

One common way of defining such configuration is with static configuration: configuration files (for example, JSON files) or environment variables bundled with the service. While simple to implement, the main downside is that for any service behavior change, i.e., configuration change, complete service redeployment/code push is required. Such action triggers the service CI/CD pipeline, which can take plenty of time.

Figure 2. Diagram showing how a CI/CD pipeline pushes static configuration data, which can take a lot of time.

Dynamic Configurations, on the other hand, allow to quickly make changes.

Figure 3. Diagram showing dynamic configuration, which can be updated at runtime and is much faster.

The dedicated pipeline is much faster than the service pipeline as it has zero logic other than deploying the new configuration to AWS.

In addition, no service redeployment is required. The service fetches the configuration in runtime with an API call and behaves according to the configuration. When a new configuration version is deployed, the service will fetch the new configuration and alter its behavior.

How to Store and deploy the feature flags

CyberArk services uses a dedicated CI/CD pipeline, separated from the main service pipeline, that takes a JSON configuration file and deploys it to AWS as an AWS AppConfig configuration. This is done in the process defined in AppConfig’s documentation.

Each service is defined as an AWS AppConfig application, each account (‘dev,’ ‘test’, ‘production’) is defined as an AppConfig environment. Each service defines one configuration, but it’s up to the service team to decide. A service can consist of several microservices, each with a different configuration under the ‘main’ service AppConfig application.

The dedicated configuration CI/CD pipeline leverages AWS AppConfig capabilities such as:

Schema validators – prevent uploading malformed configurations that will break your service in runtime.
Canary deployment of new configuration
Automatic rollback during configuration deployment in case of an AWS CloudWatch alarm triggers.
Configuration versioning – each configuration has a version, and AWS AppConfig provides visibility of all the previous versions.

How to Evaluate Feature Flags

AWS Lambda Powertools is “A suite of utilities for AWS Lambda functions to ease adopting best practices such as tracing, structured logging, custom metrics, idempotency, batching, and more.”

CyberArk uses AWS Lambda Powertools Python feature flags utility to evaluate feature flags at runtime and to implement A/B testing with its runtime rule engine.

The AWS Lambda Powertools feature flags utility enabled the service to:

Fetch JSON-based configuration stored on AWS AppConfig
Store the JSON configuration in an in-memory cache to reduce frequent calls to AWS AppConfig and reduce total cost.
Use a simple API to evaluate a feature flag by name.
Implement A/B testing with its internal rule engine. The rule engine evaluates feature flags as ‘True’ for some customers but ‘False’ for others – change feature flag value according to session context.

Feature Flags Best Practices

This section describes feature flags’ best practices that span over several aspects of a feature release: testing, releasing, deploying, and retirement.

Test

Developers usually focus their tests on the obvious use case: enable the feature and test the new logic surrounding it: verify that the business logic is handled correctly and the side effects are as expected. They will also ensure that the code coverage remains as high as possible if they are thorough.

However, it is also critical to verify that the feature’s logic is not run when the value of the feature flag is False. This might seem obvious, but running a feature’s logic when its’ flag is set to False, can have horrific results. It can be caused by bugs or unhandled edge cases.

The easiest way to create these two tests’ variations is to mock the configuration returned from AWS AppConfig; the first test uses a mock configuration with the feature flag set to True, while the second test mocks it to False. Mocking provides consistency to the tests, so it doesn’t change when the actual configuration in your account changes.

Release/Deploy

Assuming you don’t deploy straight to production, it’d be wise to first deploy the feature flag as ‘disabled’ to all non ‘dev’ environments. When you are ready and can test & debug, release the feature flag in at least one environment that simulates a real production environment- ‘staging.’ This might cause E2E tests to fail if your mocked tests missed some edge cases. In that case, add the missing tests and continue the release.

When you are confident that the feature is ready for production, deploy to production with a canary deployment strategy.

Plan to Retire

Feature flags are powerful and addictive. However, the more you add, your code complexity and testing overhead increase. Nevertheless, you are not bound forever to those feature flags. It’s okay and recommended to ‘give them the ax’ once they reach maturity and stability. The development team should schedule a one-hour-long meeting per month to review the current state of feature flags and decide what features can be retired.

Here are some rules of thumb for selecting a proper retirement candidate:

The feature has been deployed to 100% of customers for ‘X’ weeks. Use common sense to define ‘X.’
The feature has been stable for ‘X’ weeks — no known issues/bugs.
Customer feedback is positive, and there are no open issues.
The feature is not expected to undergo any refactors/additions.
Your product team does not use it or believes it is required anymore.

It should be noted that AWS AppConfig has the native capability in its Feature Flags for engineers to set a flag as “short-term” and also include an optional target deprecation date for that flag. You can then search and filter on short-term flags and prioritize them for clean up.

Summary

AWS AppConfig has increased CyberArk’s confidence in releasing new features by incorporating feature flags into its CI/CD pipelines and services. AWS AppConfig provides feature flags deployment with schema validation, automatic rollback in case of errors, and canary deployments. In addition, AWS Lambda Powertools allowed CyberArk’s services to implement A/B testing and fetch dynamic configurations with a simple API.

Key resources to get started:

About the author:

AWS Cloud Operations & Migrations Blog