AWS Cloud Operations & Migrations Blog

Leveraging AWS CloudFormation to create an immutable infrastructure at Nubank

Bruno Halley Schaefer, software engineer, Nubank
Hugo Carvalho, senior solutions architect, AWS
Marcelo Nunes, senior technical account manager, AWS Enterprise Support Team

 

Nubank, a Brazilian company that is one of the world’s largest independent digital banks, is innovatively transforming Latin America’s financial landscape by providing transparent, simple, and efficient services. The company fights complexity to empower people, give them back control of their finances, and redefine their relationships with money.

Nubank was born in the cloud and AWS has supported them on their journey since day one.

Nubank’s engineering team embraced functional programming ideas and immutability from the start. They apply immutability concepts to their microservices developed using Clojure and their data management and persistence using Datomic, and they handle their infrastructure with the help of AWS CloudFormation.

Overview

Before cloud computing became a widely available reality, building a server infrastructure was about dealing with expensive physical servers. Replacing those physical servers was so costly and time consuming that the most practical approach was to apply any necessary changes to the servers running in production. This is the nature of a mutable infrastructure, and all those in-place modifications eventually lead to critical problems like inconsistency, unreliability, and increasing complexity.

One of the biggest challenges in building infrastructure today is predictability. However, thanks to virtualization and cloud computing, it’s now possible to have new deployment workflows that can help companies address this challenge. One of those new workflows is based on the core idea of immutable infrastructure, in which no modification to a running server is allowed unless the server is completely replaced with a new instance that contains all the necessary changes.

Walkthrough

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. It allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts.

Infrastructure as code

Before we detail Nubank’s approach to immutable infrastructure, it’s essential to describe a fundamental aspect of how Nubank handles infrastructure: Everything has a code representation in the form of definition files.

The following is a simplified example of a definition. This definition contains the code representation for a Datomic transactor running on an Amazon EC2 Auto Scaling group.

 

{:name :ronaldo-datomic

:squad :platform

:environments {:staging #{:s0 :s1 :s2}}

:workload {:type :generic-legacy, :multiplier 1}

:storage {:type :dynamodb

:resource-name "ronaldo-datomic"}

:memory {:jvm-xmx "2500M"

:jvm-xms "2500M"

:object-cache-max "1g"

:memory-index-max "512m"}

:write-concurrency 5}

 

These definitions are Nubank’s source of truth, and their infrastructure should always reflect this source of truth.

Immutable infrastructure at Nubank

To achieve an immutable infrastructure, Nubank constantly creates and destroys cloud resources. Always following a blue-green pattern, they first create new resources containing all the modifications they wish to deploy and then drop the old resources after they verify that everything works as intended.

The following diagram shows the standard blue-green pattern followed by Nubank:

 

AWS CloudFormation helps Nubank throughout this entire process and is at the core of their immutable infrastructure and blue-green process.

This service provisions resources in a safe, repeatable manner, allowing you to build and rebuild your infrastructure and applications without having to perform manual actions or write custom scripts. AWS CloudFormation takes care of determining the correct operations to perform when managing your stack and rolls back changes automatically if errors are detected.

To take full advantage of the power provided by AWS CloudFormation, Nubank uses Nimbus, one of Nubank’s custom build tools. Nimbus, written in Clojure, is responsible for abstracting and automating Nubank’s interactions with AWS and translates Nubank definition files into AWS CloudFormation stacks.

The following diagram shows how Nimbus interacts with AWS CloudFormation to create infrastructure resources:

After Nubank uploads a template to AWS CloudFormation, the tool (AWS CloudFormation) takes care of creating your resources while respecting their interdependencies and enforcing an all-or-nothing operation: All resources are successfully created, or none of them are created at all.

Before deleting AWS CloudFormation stacks (thus excluding associated cloud resources), Nubank’s engineering team closely monitors the overall health of new resources as they gradually increase their load. To monitor their systems, Nubank relies heavily on Grafana dashboards and OpsGenie alerts: All built on top of Prometheus metrics.

These alerts are also part of Nubank’s infrastructure as code and therefore are also represented using Nubank definition files. The following example is the definition of an alert that aims to capture sudden spikes in Kafka consumer latency:

 

{:name :kafka-average-consume-latency-time-ms-too-high

:squad :runtime

:environments {:prod    #nu/prototypes-for [:prod :sharded+global+monitoring]

:staging #nu/prototypes-for [:staging :sharded+global+monitoring]

:test    #nu/prototypes-for [:test :nu+mobile]}

:expr ["avg(kafka_network_total_time_ms_fetch_consumer_95thpercentile) by (environment, prototype, stack_id) > " :threshold]

:threshold 600

:default-filter-labels [:squad]

:for-minutes 10

:alertmanager-labels {:severity "warning"}

:annotations {:stack       "{{ $labels.environment }}-{{ $labels.prototype }}-{{ $labels.stack_id }}"

:instance  "average for all instances"

:value       "{{ $value }}"}}

 

Conclusion

Nubank extensively uses AWS Cloudformation, and currently, they have 3000+ stacks that are used to manage thousands of cloud resources across multiple AWS Regions. Nubank builds AWS CloudFormation stacks to handle simple scenarios like the provisioning of single Amazon EC2 machines, but they also build stacks to handle the complex and interdependent infrastructure used to support Kafka and Kubernetes clusters.

The following diagram shows the AWS CloudFormation template for one of the Nubank’s Kubernetes clusters:

Nubank’s engineering team is working to make Nimbus part of a fully automated process, thus eliminating the need for human interaction through a command-line interface (CLI). This level of automation allows new cloud resources to be provisioned as soon as a Nubank definition file is modified and exported to Amazon S3. However, it will also automatically rollback if strange and potentially dangerous behavior is detected after deployment.

By leveraging AWS Cloudformation, Nubank can create deployment units of complex infrastructure resources while keeping the overall management complexity in check.

 

About the Authors

Bruno Halley Schaefer is a software engineer at Nubank. As a member of the platform team, he helps design automation tools to support Nubank in a hyper-growth scenario.

 

 

 

Hugo Carvalho is a senior solutions architect at AWS who specializes in helping startups at different maturity levels build sustainable and highly scalable platforms in the cloud. With more than seven years of experience in technology and tech team management, Hugo has helped many companies to define, implement, and evolve tech solutions for different market segments.

 

 

Marcelo Nunes is an AWS senior technical account manager, supporting Nubank since 2017. With more than 20 years of experience in technology and as a member of the AWS Enterprise Support Team since 2015, he has helped many companies on their AWS journey.