AWS Cloud Operations Blog

VMware’s Cloud Journey: AWS Account Management at Scale

This post was co-authored with Thiru Bhat, Director CSO-CM, Office of the CTO, VMware

VMware has been developing virtualization software since 1998. Headquartered in Palo Alto, California, the company is known for its application modernization, cloud, networking, security, and digital workspace offerings. They require a thorough and all-encompassing approach to ensure that corporate controls and standards are rigorously followed. This requires effective compliance maintenance, ample security measures, and optimized cost management, all within the realm of public cloud services. VMware has design controls encompassing all aspects of cloud operations, allowing them to not only meet their compliance obligations but also fortify their security posture and optimize cloud costs.

VMware’s Office of CTO (OCTO) operates Cloudgate, an internally developed service, to provision access and manage the lifecycle of workloads across public clouds such as AWS, Azure, and GCP. This includes a common framework for account lifecycle management across the enterprise to simplify onboarding and management. Today, VMware has more than 38,000 employees who use tens of thousands AWS accounts to deliver software solutions to their customers. In this post we discuss how they use AWS Organizations, with all features enabled, to eliminate one of their main pain-points with AWS account closure at scale. We ‘ll show how they used AWS APIs to manage AWS account creation and account closure across this large fleet. We also show how VMware helps others adopt or create a similar process.

1/How VMware manages the AWS account lifecycle

With such an extensive fleet of accounts and EC2 instances, handling daily account creation, deletion, and changes can be challenging at their scale. VMware’s OCTO team adopted the standard AWS account creation process documented here and Cloudgate would provision new AWS accounts using those documented process. While the account creation processes worked well, the standard AWS account deletion process was a manual, time consuming process. This required human intervention and was not scalable.

VMware found that the most common reasons to close accounts were as follows: 1/Change of ownership due to personnel changes/re-organization/merger or acquisition, 2/ AWS accounts were no longer required 3/ Test accounts or temporary AWS accounts created for demo or training purpose and no longer had any use.

To address this pain point, VMware team tried a few approaches. First, re-purpose the AWS account for other use, but that was too onerous. Next, they used UI automation, via Selenium, to close AWS accounts programmatically. However, this was not reliable, breaking often. However, in Q1 2022, AWS released the CloseAccount API which helped close the gap. The VMware OCTO team developed an automated process to close AWS accounts using the API. However, there is a prerequisite to adopt CloseAccount API which we will discuss in the next section.

2/All features from AWS Organizations and why it’s critical:

To illustrate the importance of all features from AWS Organizations, we want to first provide more background on Cloudgate. In addition to account lifecycle management, compliance, and governance, Cloudgate supports several organizations and management accounts and several thousand member AWS accounts across those organizations. Furthermore, it supports these elements  across other cloud service providers. VMware’s security operations center monitors all AWS accounts 24×7 and alerts account/workload owners in case of malicious or suspicious activity. Finally, CloudGate also supports VMware global chargeback process, allocating AWS spend to the business unit’s cost center consuming those services.

The VMware team adopted AWS Organizations, the AWS service that provides the foundational capability for multi-account cloud governance on AWS. All VMware management accounts were originally set up with the ‘consolidated billing’ feature set from AWS Organizations.

AWS Organizations has two key feature sets:

The VMware team migrated to ‘all features’ to take advantage of the above features and also as the prerequisites to use CloseAccount API, which allowed them to programmatically close member accounts in their organization. VMware used a similar process as outlined in the ‘Simplified multi-account governance with AWS Organizations all features blog post to migrate their organizations to ‘all features.’
The VMware team also took additional steps/safeguards in consideration for the ‘all features’ migration process:

  • They started with their smallest organization and continued migrating in steps with larger organization sizes. This preparation required close collaboration between the AWS enterprise support and Technical Account manager team, engaged with the VMware OCTO team.
  • VMware evaluated a standard migration, whereby accounts would need to accept handshake invites before moving to all features. However, given the number of member accounts across their organizations, their team opted for the Assisted migration option which was available for enterprise support customers, allowing VMware to have the process managed centrally without requiring member account access.
  • The migration process involved monitoring and evaluation of workload & account security, both during and after the migration was completed.
  • Monitoring ensures that the CloudGate service was functioning correctly, without impact on users, and importantly no AWS accounts accidentally opted out of the process.

3/ How VMware usages AWS APIs to manage AWS account creation and closer at scale

Once the needed approvals are in place for governance, VMware OCTO team uses AWS CreateAccount API and CloseAccount API to automate AWS account creation and closure. The VMware team used the open-source automation server platform Jenkins to set up jobs that automatically create or close AWS accounts without human intervention. We share a code fragment below.

Code fragment in Python for Account Creation:


def create_new_acct(org_client, alias_owner_id, BU, cost_center, trust_role_name):
        aws_account_name = "{} {} {}".format(cost_center, BU, alias_owner_id)
        gen_new_email = True
        existing_emails = []
        while gen_new_email:
            email_address = generate_root_email_address(
                org_client, alias_owner_id, existing_emails
            )
            response_org = org_client.create_account(
                Email=email_address,
                AccountName=aws_account_name,
                RoleName=trust_role_name,
                IamUserAccessToBilling="ALLOW",
            )
            status_id = response_org["CreateAccountStatus"]["Id"]
            print("Account being created within AWS Organization . . .")
          while True:
                try:
                    sleep(5)
                    response_acct_status = org_client.describe_create_account_status(
                        CreateAccountRequestId=status_id
                    )
                    acct_status = response_acct_status["CreateAccountStatus"]["State"]
                    if acct_status == "SUCCEEDED":
                        gen_new_email = False
                        print("Account created")
                        break
                    elif acct_status == "FAILED":
                        failure_reason = response_acct_status["CreateAccountStatus"][ "FailureReason"]
                        if failure_reason == "EMAIL_ALREADY_EXISTS":
                            existing_emails.append(email_address)
                            break
                        else:
                            gen_new_email = False
                            print("Account creation failed")
                            print(failure_reason)
                            raise Exception(
                                "Account creation failed: {}".format(failure_reason)
                            )
                    elif acct_status == "IN_PROGRESS":
                        gen_new_email = False
                        print("Account creation still in progress . . .")
                        continue
                except Exception as e:
                    gen_new_email = False
                    raise e
     Acct_Id = response_acct_status["CreateAccountStatus"]["AccountId"]
     # Wait for the payerRole to be created
        sleep(45)
     print("Account creation completed. New Account id is: {}".format(Acct_Id))
     print("The account email address is: {}".format(email_address))
     return (Acct_Id, email_address, aws_account_name)

Code fragment in Python for Closing an Account:

 
account_id = args[""]
    logging.info(f"Closing aws account {account_id}")
     client = boto3.client("organizations")
    status = get_account_status(account_id=account_id, client=client)
    if status == "SUSPENDED":
        logging.warning(f"Account {account_id} already closed: {status}")
        return
    client.close_account(AccountId=account_id)
  while status != "SUSPENDED":
        status = get_account_status(account_id=account_id, client=client)
        time.sleep(5)

4/ Best practices for account management workflows at VMware:

For any enterprise, it’s imperative to maintain an audit trail and governance during the AWS account creation and closure process. The VMware OCTO team developed this outline and guidance for their lifecycle processes.

AWS Account Creation:

  1. If the AWS account is new, it is important to setup the account in a new organization unit (ou) for the specific team or in an existing organization unit.
  2. The organization unit and account naming are described by requestor of the account.
  3. For a new organization unit, define the list of approvers (e.g., the engineering owner and finance owner).
  4. The process requires approval of the manager of the requestor of new AWS account.
  5. Then, it requires approval from the finance or additional management workflow as appropriate.
  6. Audit trails are added to make it easier to track users and approver in case requestor of the account leaves the company or moves to another business unit.
  7. Alternate contacts are added (security, operations, billing) as additional forms of identifications.
  8. Use a distributed email list as the root email of the account so it is not impacted by resource turnover.

AWS Account Closure:

  1. Account requestor is responsible for decommissioning of workload prior to account closure request.
  2. The AWS account requestor approves the account closure, or the VMware team seeks approval if the requestor is no longer with the company.
  3. Other users of the AWS account are contacted for approval prior to a closure to make sure workloads have been closed.
  4. The team reviews AWS billing, CloudTrail and Resource Groups to gather insight about the current workload.
  5. If the account closure requested by another user other than the account requestor, then organization owners need to cross check and validate that AWS account usage.
  6. Account closure approval audit trail helps reduce reopen requests.
  7. AWS offers post closure period of 90 days to re-open the closed AWS account. However, with large number of accounts, it is advisable not to re-open the closed account.

Conclusion

In this blog post, we shared how VMware successfully migrated to AWS Organizations all features set and use account APIs to automate AWS account lifecycle process. We have also shared the best practices for AWS account creation and closure at VMware. VMware uses AWS Organizations to enforce corporate controls on compliance, security, and cost management across AWS account fleets. Centrally managed access and governance reduce the attack surface against cloud resources. Moving an AWS account across organizations during mergers or acquisitions has become a seamless process with the successful implementation of multi-account governance with AWS Organizations. The methodologies described in this post will be helpful for any enterprise managing AWS accounts at scale.

About the Authors

Thiru Bhat author photo

Thiru Bhat

Thiru Bhat is an engineering director at VMware. Thiru’s team builds and operates CloudGate and CloudGate Governance services which manage access, lifecycle management and governance on public cloud accounts at scale..

Satya Pattanaik author photo

Satya Pattanaik

Satya Pattanaik is a Sr. Solutions Architect at AWS. He has been helping ISVs build scalable and resilient applications on AWS Cloud. Prior joining AWS, he played significant role in Enterprise segments with their growth and success. Outside of work, he spends time learning “how to cook a flavorful BBQ” and trying out new recipes.

Rami Kandah author photo

Rami Kandah

Rami Kandah is a Sr. Technical Account Manager at AWS focused on ISV customers. He has been helping ISVs with cost optimization, resiliency and networking on AWS Cloud. Prior to joining AWS, he spent over a decade helping customers with enterprise networking solutions.

Scott Webber author photo

Scott Webber

Scott Webber is a Principal CusItomer Solutions Manager at AWS supporting the ISV industry segment. He current supports Broadcom/VMware. Scott has 25+ year experience working in the technology domain.