AWS Cloud Operations Blog
Why a Cloud Operating Model?
A thought leadership blog highlighting an “innovative approach” to Cloud Operations excellence and Well-Architected goals. This blog walks you through MuleSoft carrying out this new approach including their:
- Challenge
- Innovation
- Journey
- Implementation of the Cloud Operating Model
Challenge
Whether companies are migrating to the cloud, or cloud-native, executives are faced with controlling costs and continuous improvement in their organization’s use of the cloud.
MuleSoft is no different. In April 2022, MuleSoft found its cloud computing costs were over budget, and needed to quickly reduce spending within 6 months. Furthermore, the market was shifting from emphasis on revenue growth to one on profitability, which added pressure. Secondly, in the ever-expanding cloud infrastructure, leaders are re-thinking roles and responsibilities in this fast evolving space. MuleSoft needed to push more decision-making and accountability to the edge.
Mulesoft built a Cloud Oversight program across all its products and development teams that to implement proper oversight on their AWS cloud usage and governance structure, bringing a projected $12M budget overrun in line with the year’s budget. MuleSoft’s innovative approach to cloud operations to build frameworks, processes, and visualizations is a good example to leaders of how to set them up for operational success.
Innovation
Based on MuleSoft’s journey, this post discusses how MuleSoft developed an innovative cloud operating model to reduce their cost, and bring decision-making and accountability to engineering teams. To build a cloud operating model, MuleSoft consulted its stakeholders, relied on its engineering methods, and tapped into its AWS resources. MuleSoft also reviewed AWS tools, like Trusted Advisor and those from other vendors. The model that addresses costs and decision-making goals is built on our Cloud Oversight Engineering Framework and AWS’ Well-Architected concepts.
MuleSoft’s Cloud Oversight Engineering Framework started with the business goals and led to KPIs. See Figure 1. The framework starts with bringing in data that support the goals, enriches them and provides recommendations to the engineering teams. As teams work against the recommendations, the KPIs improve. In this case, well-architected was a goal, so MuleSoft’s well-architected maturity improves.
Figure 1. Cloud Oversight Engineering Framework
MuleSoft’s Operating Model supports cloud cost control, and oversees more than 3,000 EKS clusters and 400,000 EC2 instances, and can accommodate MuleSoft’s growing fleet. More than 100 MuleSoft teams, including engineers, managers, and executives, share cloud infrastructure data on consumption status, accountability, and actionable intelligence.
Journey
In 2022, MuleSoft was able to reduce their cloud cost within the budget. MuleSoft learned a number of important lessons in their initial cost reduction efforts. Teams want to do the right thing, but they do not have a lot of time due to other priorities. First, infrastructure information must be accurate. Second, recommendations must be actionable and doable. Third, information and recommendations have to be easily accessible.
Beyond cost reduction, MuleSoft wanted to make gains in efficiency, security, and sustainability. In some cases, this meant shifting left. Throughout the process, MuleSoft worked extensively with stakeholders in its cloud operations. See Figure 2. MuleSoft also engaged our AWS account team and their experts. Throughout the process, MuleSoft learned about its own needs and many of the AWS expertise that can be leveraged.
Among the AWS expertise that MuleSoft tapped into was the Well-Architected Framework and the cloud operations modeling. MuleSoft also incorporated AWS API’s such as the Automated Enterprise Discount Plan(EDP), which pre-calculates contract discounted prices for each AWS service. MuleSoft also participated in the Trusted Advisor API beta program, which provides well-architected pillar recommendations. (AWS Trusted Advisor API will be available soon.)
Figure 2. MuleSoft Oversight Engineering and Stakeholders
Cloud Operations can be difficult to scale and institute cloud best practices. The Mulesoft teams uses various cloud operation services, such as:
- CloudWatch for monitoring AWS resources,
- AWS Systems Manager for patching more than 375K+ EC2 instances,
- Trusted Advisor APIs for Well Architected recommendations for all pillars,
- Sustainability APIs for carbon identification and reduction opportunities, and
- AWS Shield Advanced and AWS GuardDuty for securing deployed platforms/infrastructure.
AWS Well-Architected not only gave rise to a set of goals, but provided benefits that drove the MuleSoft teams towards greater compliance and operational excellence.
Implementation of the Cloud Operating Model
MuleSoft anchored on a cloud operating model implemented by a lake house to provide the data, and a display portal for recommended action. See Figure 3. The Oversight Lake House with an event-driven architecture ingests,enriches the data and recommendations, and the Cloud Central portal displays the data and recommendations to each team.
Figure 3. MuleSoft’s Implementation of the Cloud Oversight Engineering Framework
Based on MuleSoft’s goals, the Oversight Lake House ingests relevant data and recommendations. The Oversight Lake House ingest information and recommendations from such sources as the AWS Costs and Usage Report (CUR), and Trusted Advisor API. It also ingests MuleSoft specific product, environment, and team data, and other company-specific data such as compliance and security data. The Lake House enriches and processes data, and parses it out for teams.
In order to maintain consistency, accountability, compliance, and ownership among our teams, MuleSoft Oversight team developed a self-serve Cloud Central Portal. See Figure 4. The motivation behind Cloud Central is to bring education, enablement delivering personalized recommendations for each team spanning all Well Architected pillars for their platform/infrastructure. Cloud Central serves as the single pane of glass for both Managers and team members. MuleSoft teams will continue to mature by having access to recommendations and benchmarks measured against Well Architected. See Figure 5.
Figure 4. Cloud Central Portal
Figure 5. Well-Architected Pillars and AWS Technologies
Through each engineering team, pursuing the recommendations, MuleSoft drives closer to its well-architected goals.
Conclusion
As organizations strive for operational fitness for their cloud infrastructure, it is important to embrace innovative approaches. MuleSoft has built a cloud operating model that includes Well-Architected goals. The operating model includes an event-driven Lake House to prepare, and enrich data, producing actionable recommendations to each engineering team. Personalized cloud infrastructure information and these recommendations are brought to the team through the Cloud Central portal. The portal drives decision-making and accountability to the edges. With each cycle, MuleSoft’s overall cloud maturity increases. See Figure 6.
Figure 6. Increasing Well-Architected Maturity
The formula for a successful cloud operations and continued improvement are three takeaways:
- Understand your goals and work toward solving your technical/business challenges.
- Develop a straight forward model for the organization to implement and accept and an implementation the organization can buy into.
- Provide leadership and support to make this happen.
Using its engineering methods, MuleSoft has built a system that is ready to incorporate into engineering, support, security, and platform teams. The architecture of the oversight system allows it to inform engineering teams and leadership, and to grow along with the organization’s needs. For example, as MuleSoft transitions from selling cores to selling by transactions, it leads to generating new data sets and, powering a new set of KPIs. The system allows for new uses to be accommodated, modularly.
About the Authors