Fail fast but safely – how Old Mutual is using Developer Sandboxes for real digital innovation
This is a guest post co-authored with Kershnee Ballack and Wilkister Wechuli from Old Mutual Limited
Old Mutual Limited (OML) is a pan-African financial services group that offers financial solutions to retail and corporate customers across 14 African countries. Its purpose is to help customers thrive by enabling them to achieve their lifetime financial goals, all while investing their funds in ways that will create a positive future for them, their families, their communities, and broader society. Within OML, the MyOldMutual division applies customer-focused design thinking, data management, cloud engineering, agile and lean development methodologies, and continuous delivery practices in its deliveries.
“In MyOldMutual we believe the cloud can help us know our customers personally and empower them to completely control their financial futures through self-service digital channels. Similarly, for our advisers, we believe the cloud can make it even easier for them to do business with us, and at a lower cost. However, we need a cloud-skilled workforce to deliver an improved and cost-effective experience for our customers and advisers. By upskilling our people, we aimed to reduce the learning curve ramp-up and bolster our Builders’ confidence to build in the cloud. We didn’t have the luxury of waiting for the complete up-skilling of our Builders – we needed people who can function in AWS while continuing to learn AWS. This meant finding and maintaining a balance between experimentation, just-in-time learning, and delivering value.”, said Kershnee Ballack, IT Executive for MyOldMutual.
Getting Builders excited about AWS Services
Learning to build enterprise-grade solutions using AWS services is one thing, but arousing interest among Builders (software engineers, designers, and quality engineers) is something else. In Platform Engineering, Old Mutual Limited believes Builder morale is essential to a quick and successful digital transformation. Magic happens when Builders are challenged to learn and use newly-acquired knowledge to deliver value. Sam Elmalak introduces the Developer Sandbox concept in his re:Invent talk to answer the question, “What about the places where you can experiment and build things?”.
Why do we need Developer Sandboxes and what are they?
Developer Sandboxes are essential for empowering users to innovate, manage costs, and operate within guardrails. Furthermore, they build confidence in the people operating the environment. For OML, AWS Developer Sandboxes are a team enablement mechanism that equips Builders with hands-on know-how to build confidently in the AWS cloud. Developer Sandboxes are isolated AWS environments in which Builders learn about or test AWS-native concepts without impacting either production or non-production workloads. In fact, they’re isolated from all non-production and production workloads, with no connectivity to the data center (unlike non-production and production). By isolating Developer Sandboxes from other environments, we protect our mission-critical infrastructure from any possible harm caused by experimentation. Moreover, because Sandbox accounts mirror production environments, Builders can use them to build, test, and validate concepts and patterns before deploying patterns in the non-production (and production) environments.
Using AWS Budgets to govern cost in Developer Sandboxes
Considering the unique pay-as-you-go nature of pricing for each AWS service, we must track the usage costs in each Developer Sandbox. As a principle, we choose to alert early (starting at 50% of the budgeted amount) and often (25% increments thereafter). When they first access their Sandboxes, Builders use AWS Service Catalog to provision an AWS Budget where they will receive generated alerts. We have found that empowering Builders to provision their own budget (and not automating it as part of the AWS account creation) creates Cost Shock awareness inside the Sandbox. In our weekly hands-on workshops, there is often banter about building without budget breaches.
Figure 1: Service Catalog Product for an account cost budget
Enforcing preventative guardrails in Developer Sandboxes
Aside from cost, the security governance of Sandbox accounts was front and center when we began the Sandbox journey. There were trivial concerns that we addressed quickly, e.g., a Multi Factor Authentication device for Sandbox access is mandatory. And there were several, less trivial things that we didn’t want to be possible inside of the Sandbox because they didn’t make sense. We don’t want to create a shadow IT environment. It is important to limit, as far as possible, the ability of someone using their Sandbox for development work that could turn into long-running infrastructure. We believe that the Sandboxes should be a scratchpad where resource types and sizes are restricted, and they are automatically cleaned up on behalf of the developer several times each month. We initially asked ourselves a few questions that considered cost, security, and operational support as Platform Engineering:
- How do we make sure that Builders aren’t allowed to purchase domain names in Route 53?
- How do we make sure that no subscriptions are on the Amazon Marketplace? As a principle, we want to help our Builders learn to build using native AWS services.
- In each Sandbox, how do we make sure that the root user can’t access the account?
- How do we make sure that Builders can only launch general purpose Amazon Elastic Compute Cloud (EC2) instances that don’t have over 1GB of memory?
- How do we make sure that Builders can only create general purpose database instances on Amazon Relational Database Service (RDS) that don’t have over 1GB of memory?
- How do we make sure that our Builders habituate encrypting at volumes created?
- How do we make sure that Builders can only create resources in a specific region?
- How do we prevent Builders from creating an S3 bucket if the bucket doesn’t have the DataClassification tag key?
Using Service Control Policies (SCPs), AWS Organizations, and Customizations for AWS Control Tower, we answered these questions and more. The Customizations for Control Tower solution extends the capability of an AWS Control Tower deployment. This allows the operator to attach SCPs to organization units (OUs) organization or individual AWS Accounts within an AWS Organization. All of the questions we initially asked ourselves could be expressed as statements in an SCP, and that SCP could be attached to the Sandbox OU where all of the Sandboxes live.
Figure 2: Sample SCP restricting EC2 and RDS family and size
Enforcing security oversight in Developer Sandboxes
We maintain security dashboards of all of the Sandboxes from data aggregated into a designated security-tooling account. The account is accessed by some members of the Platform Engineering team. As a principle, we want to publish data from this account into a dashboard and have no humans log in to the account. These dashboards aggregate findings from AWS Security Hub, AWS Identity and Access Management (IAM) Access Analyzer, and AWS Config across all Sandboxes. AWS Security Hub is our most-relied upon service to give us a bird’s-eye view of our Sandboxes. It alerts us to any AWS Foundational Security Best Practices or CIS Benchmark violations found in the Sandboxes. As a principle, we stick to the technology vendor mainline as far as possible. It made sense for us to build on top of the recommended and elective guardrails that shipped with AWS Control Tower for those questions where preventative controls are non-trivial.
We have an SCP which prevents S3 buckets from being created if the DataClassification tag key is missing. And, for those S3 buckets created in the Sandbox environment, Builders can expect to have them deleted at least once per sprint. With all of these measures in place, we feel the need to use Amazon Macie to give confidence that there is no restricted data inside of S3 buckets.
Boosting the morale of our Builders
As Platform Engineering, the best way for us to scale is empowering all Builders to self-serve. More specifically, we empower Builders through weekly hands-on workshops for squads where Builders get hands-on experience in AWS services in their individual Sandboxes. The workshops let Builders find enabling technologies in AWS. This helps Builders with real-world business problems, and they can build and break solutions without impacting the production systems.
Seymour Ewers, Lead Test Analyst, said “Sandbox accounts make you feel safe; it is a safe place to make mistakes and figure things out. This is like a free training session – there is no real cost involved on our side in terms of time and resources; everything is done for us by Old Mutual as part of our weekly schedule. Other classes or tutorials, we may have to pay someone to explain something to use in a classroom environment or 1-on-1.”
Furthermore, the major objective for running these weekly workshops is to build confidence in AWS operations for all Builders. “Before if you asked me if I was comfortable scratching around in AWS and going deeper into the rabbit hole in terms of things you know, I wouldn’t have been comfortable. I would have watched other people run with the deployments, now I can follow. Creating a stack is not something that scares me anymore. I’m past the stage of not being confident; I want to get to a point where I can open the bonnet and have a look around. I am happy that we’re driving the car, and everything works, now I can see what happens when I add three more pieces into the engine” added Seymour.
The weekly workshops have helped make sure that we’re all speaking the same language as Builders. Revathi Pachigolla, Data Analytics Specialist, said “Once we deploy some of the pipelines might fail, we had little knowledge or information on troubleshooting them before. Now we have some knowledge on how we can troubleshoot the pipelines.”. And Mogamat Hoosain, Senior IT Tester, also said “It makes you feel like you are on the same page as developers, sometimes the developers get technical. As much as we try to connect the dots and understand what they are saying, with this training we get a lot more perspective on what they are doing. We are beginning to speak the same language.”
An unanticipated by-product has been the workshops becoming a team-building mechanism. Mogamat says “Usually, we are in our own streams all the time, we all have meetings throughout the day within our own streams. We’re not always in the same meetings as a team, this is almost like a team building. It really is up-skilling as a team, and I see it as team building. This is the space where we get to learn and interact together as a team, there aren’t other sessions we have for this. Collaboration and interaction amongst the testing team is what these sessions has brought.” And Seymour further said “I feel like as the testers we are now in a group, the communication has been improved. Previously, we weren’t speaking to each other. After these sessions the communication and collaboration has improved between the testers, and now we know who is working on which features and on which platform” added Revathi. The key for us, is to make the workshops and learning fun so it doesn’t feel like “work”. “It’s two hours, and it doesn’t feel like it’s work-related. The fact is that it will benefit us by learning new stuff, and I have the bonus of spending time with the testing team.”
How are we taking this forward?
As OML, we’re still on the journey to exit our data centers into AWS – many teams and groups across the OML group still don’t have Developer Sandboxes. We’ll continue onboarding teams onto Developer Sandboxes as part of quarterly cohorts. We want to tailor the weekly workshop content to each team’s migration, as well as build objectives for that quarter, thereby shortening the loop between learning and hands-on implementation.
We’re gamifying the up-skilling journey, with a leaderboard tracking the IP generated (championing knowledge sharing) and certifications achieved at both the team and individual level. We want more stories like Timothy Maruti, our DevOps Lead based in Nairobi, who said “When I made up the decision to go for my AWS DevOps certification mid this year, I felt am not good enough because I felt I am missing the required experience to sit for it. I had decided to put on hold my certification till 2022 to give myself room to gain more experience.” Timothy managed to get his certification in November 2021, and he shared the following: “You might not know how you have made me a better professional, and I feel more excited than ever”.