AWS Cloud Enterprise Strategy Blog
Common Responsibilities for Your Cloud Center of Excellence
“The right thing to do and the hard thing to do are usually the same.” ― Steve Maraboli
I recently introduced the topic of a Cloud Center of Excellence (CCoE), which is the team of people responsible for developing the cloud best practices, governance, and frameworks that the rest of the organization can leverage to transform your business using the cloud. Its creation is the fifth of seven best practices that I’m writing about in the Enterprise Cloud Journey.
Your organization’s CCoE should start small and grow as it adds value to the business. Organizations who do this well set metrics or KPIs for the CCoE and measure progress against them. I’ve seen metrics range from IT resource utilization to the number of releases each day/week/month as a sign of increasing agility to the number of projects the CCoE is influencing. Couple these with a customer-service centric approach, and other business units will want to work with your CCoE because they find value and that the CCoE is a pleasure to work with.
My last post dove into how to staff your CCoE. Here, I’ll go over some of the common considerations that I’ve found among the organizations that are successful at growing their CCoE. This post should function as a way to get you thinking about the right things to task your CCoE with, rather than an exhaustive list, and concludes with some additional resources you can leverage for more details. Remember to start small: you only need to solve for the issues you face in your current projects rather than needing to boil the ocean. You can experiment, measure, and learn as you go.
Identity management: How do you want to map roles and permissions in your cloud environment to the roles and responsibilities that you already have in your organization? What services and features are you comfortable leveraging in what environments? How do you want to integrate with your Active Directory and/or single-sign-on (SSO) platform? AWS’s IAM platform, for example, provides fine-grained access across all of the different AWS services. This level of granularity is new for a lot of enterprises, and gives you the opportunity to think through what roles in your organization should have access to what resources/services in what environments.
Account and cost management: Do you want to map accounts to business units and cost centers so you can logically separate your IT services and/or understand business-unit-specific costs? While the business units may be accountable for costs associated with their consumption, it’s much easier to centralize cost optimization across a larger portfolio of resources. Your CCoE should think about how to stagger RI purchases, so they can remain flexible with the business, and look at some of the tools (e.g., CloudHealth, Cloudability) that are available to help you with this.
Asset management/tagging: What kind of information do you want to track for each of the resources that you provision? Some examples I’ve seen include budget code/cost center, business units, environments (e.g., test, staging, production), and owners. When I was at Dow Jones, one of the first growing pains we hit was having our bill escalate as more developers started to experiment. In the course of a few hours, we addressed this by tagging each instance launched in our development VPC as such and writing a script to stop those instances on nights and weekends. This was the first piece of what became a fairly sophisticated tagging and automation library that allowed us to manage our environment as we scaled. I’ve seen many other customers do the same thing, increasingly taking action based on tags in their production environment as they mature toward highly available architectures and “disaster indifference.” (Credit to Wilf Russell from Nike on coining this phrase.)
Reference architectures: How can you build security and governance into your environment from the very beginning, and rely on automation to keep it up to date? If you can find and define commonalities in the tools and approaches you use across your applications you can begin to automate the installation, patching, and governance of them. You may want one reference architecture across the whole enterprise that still gives business units flexibility to add in what they need in an automated way. Alternatively, you might want multiple reference architectures for different classes or tiers of applications. Most likely you’ll end up with something in between, but regardless, consider how to automate more of this over time so business units can think less about the underlying infrastructure and more about their applications.
Over time, as the CCoE learns, it can become increasingly prescriptive to more business units and work the right balance of giving them freedom to innovate while still providing guardrails for consistency. Some other considerations that I don’t cover here include defining an automation strategy, exploring a hybrid architecture, providing continuous delivery capabilities to enable business units to move more quickly and run-what-they-build, defining data governance practices, and implementing dashboards that give transparency to the metrics/KPIs that are important to your business.
The AWS Cloud Adoption Framework contains a number of perspectives that give prescriptive guidance to help you think through these best practices (and more). You can also leverage AWS Trusted Advisor to proactively identify cost, performance, security, and fault-tolerance optimizations. Last but not least, you can leverage the AWS Well-Architected Framework to benchmark the work your CCoE does against the best practices we’ve seen across our the AWS customer base.
What else has your CCoE done to help your business focus on what matters? I’d love to hear about it!
Keep building,
–Stephen
@stephenorban
orbans@amazon.com
Note: Create a cloud center of excellence is the fifth of seven best practices I’m writing about in my new Enterprise Cloud Journey series. The remaining six are: provide executive support, educate staff, create a culture of experimentation, engage partners, implement a hybrid architecture, and implement a cloud-first policy. Stay tuned for more on each of these.