Well, good morning, everybody. You're a brave crew out here bright and early on a Wednesday morning, 8:30, I know at least when I look down at the conference schedule whenever I'm at a conference, 8:30 is always one of those tough ones, especially in Vegas, I imagine, to get up in the morning to make it here. So really appreciate you showing up and hope your re:Invent is going well. As the slide says, my name's Tod Golding. I'm part of a team at AWS called the SaaS Factory, and that team works with- We've been like last five or six years, been working across all kinds of domains and sort of problems and helping customers and partners who are in any stage of building and delivering a SaaS solution on top of AWS. And certainly across all the time working in that, you know, I've always said to organizations, "There's no like one-size-fits-all solution to SaaS for people." Everybody comes to us, and yes, we have like strategies and approaches we're gonna take with them, but we're always sort of finding out from each solution like, what's the right sort of flavor of SaaS for your business. And I certainly would expect for any of you as architects or builders in the SaaS space to have that same mindset, right? You're not really, you're not looking for some one way to do this. You're looking for the combination of things that are gonna be valuable to you. So, you know, we sort of resisted maybe a little bit along the way the idea of ever sort of having any sort of like here, like standard way to show somebody to do thing. At the same time, it became very clear that there were themes and there were strategies and there were patterns in there that were at least good reference patterns to be able to say, "Hey, from the menu of things I ought to be thinking about, these seem like pretty good strategies. These seem like pretty good approaches." And I ought to be- I at least wanna sort of articulate what some of those are and put them down on paper somewhere so that- For each of you, as you're going through and you're looking at what should I do in my particular solution, you can have some kind of framework for how to think about these things and where should I be poking my mind, what questions should I be asking, what are those, what's the thought process that should surround this? And again, they'll vary by stack, by technology, by domain. But still a lot of the themes that we're gonna talk about here, a lot of the patterns are pretty global concepts and should apply pretty universally to a lot of environments. And I would imagine this is going to evolve a fair amount. It's definitely, we're not like Gang of Four kind of patterns and we've got every little pattern name with a perfect name and a perfect sort of set of criteria for it. We're nowhere near that. We're way above that. But still, there's something here worth sharing. And so for me, I hope that's why you're here today. I hope that's what you're expecting to get out of this. A lot of talks I do were like, get in the weeds, show you how to build a specific solution, and this talk is intentionally more about the breadth of the SaaS space. Like what are all the possibilities of the SaaS space? Yes, we'll look at architecture, but this is a 300-level session. We're not gonna crack open the IDE. I'm not gonna be doing any coding in here today. So hopefully that fits why you're here. I totally understand if that doesn't fit and you like wanna go find a better session, I won't be offended. But we have to start any patterns discussion with like my classic sort of disclaimer, and I've already kind of teased this already from the beginning here, which is, there is no blueprint for SaaS, right? I have all kinds of people who have come to me internally and externally and say, "Hey, I've got a customer, they're ready to move to SaaS, they wanna build a SaaS solution on AWS. Can you go get whatever that blueprint or that stuff is you have in the back room and give it to us so that I can give it to the customer and then they can run it through whatever little factory or machine they have and out the other side they will suddenly be SaaS?" And I would actually say there's even businesses that come to me and think of SaaS this way, like, "We're not SaaS now, can you show us the buttons and levels and things, dials we have to just turn so we can now be SaaS?" And obviously, when we think about SaaS, we wanna think of it more as a composable sort of experience. We wanna think of this more as driven by data and driven by parameters. So for me, what I've done kind of at the outer edge of this discussion of patterns is tried to give you a sense of like, how would I categorize the sort of patterns, knowing this isn't inclusive of every single pattern. What are some of the big sort of building blocks of this patterns discussion? And one of them is this high notion of management. Here essentially, what I'm saying is, there's a group of patterns that are all about sort of how to manage and operate your business, and manage and operations, as you'll see across everything we talk about with SaaS, is essential to this. In fact, like so many people wanna talk about the application piece of SaaS and this management and operations piece kind of comes later when in reality like metrics and analytics, this isn't just about going and grabbing Datadog or AppDynamics or some new relic, and like now we have metrics and analytics. When I'm talking about metrics and analytics here, it's tenant-aware, tier-aware, multi-tenant-aware metrics and analytics toolings. Like if you imagine, when we build a multi-tenant environment and we're gonna put all of our tenants potentially into one shared environment and then we have to know what it's doing, how it's working, when it's breaking, when it's scaling correctly, when it's not scaling correctly, the only way you get those insights is investing heavily in metrics and analytics. And so we'll talk a little bit about this. I've done entire talks just on that area so we won't be able to go deep here. But at a minimum, you gotta know in that management area, there's like- There's a lot you ought to be thinking about in terms of what you ought to be doing with your solution. Of course, billing is here, how are we gonna integrate with billing providers, what does it look like or what kind of billing models are we, are we paid for use or consumption or like features, or what are we using to drive billing and how does that whole billing connect to the onboarding experience of our customers and its life cycle. And then this term "provisioning", which like a lot of people will think, yes, you provision, you configure things. You'll see as we go through this talk, there's a lot that happens here when you bring new tenants on board. As we start to have more interesting and more diverse sort of models we're trying to support, you'll find out that a whole lot of your code ends up being around this multi-tenant provisioning sort of story. In fact, if you go look at our reference architectures, and I'll give you a QR code at the end of this that you can go look at our reference architectures for this, you'll see that like 30% of the code in those reference architectures is around like onboarding and provisioning and like all the sort of DevOpsy bits you have to do that are unique DevOps bits. They're not just spin up your environment, they're spin up your environment based on all these multi-tenant criteria. The other one area, that sort of grouping of patterns is certainly application. This is the sweet spot of the area everybody knows that we talk about, right? Like we have to think about how we isolate one tenant from another tenant. How do we- What are all the different ways we can do that, all that varies based on the stack you're on, the deployment model, like all kinds of variations drive that. So we'll look at like what are the different flavors of isolation you ought to be thinking about. And then just generally, how are you partitioning workloads? Like how are you partitioning data? How are you separating data? Are you siloing data? Are you pooling data? You'll get into all these concepts. And then the deployment footprint of your application is one where there's all kinds of variations. You're gonna have different tiers. We have different mechanisms for deciding how we're gonna deploy somebody and what the deployment patterns are. We'll look at different deployment models. And then routing is always an interesting one to me because people will say, "Yes, there's routing and there's networking, yes, I need a load bouncer, I need whatever I need in front." No, what you're gonna find out is, as you get more exotic with the deployment footprint of your application, the more routing gets complicated. How do I get a subdomain? How does that subdomain get to this specific tenant workload? It gets interesting really fast. And then the last one is just sort of core here and it's this idea of tenancy, like somewhere we've gotta actually get tenancy introduced into our environment. How do we introduce it via identity? How do we connect users to tenants? That'll be a bigger topic that we'll talk about here. How do I orchestrate all the stuff that has to happen to get a tenant into the system? A huge area to look at. I actually have to have tenants themselves here. And then tiering. I did a chalk talk if any viewer in my tiering chalk talk, tiering sort of spans all of this, and is actually a huge architectural tool, not just a business tool. So there's patterns around tiering, both around throttling, around packaging, around deployment. So these are the like the- Like the highest level list I could come up with to say, "Here's all the things, the sort of menu of things that you ought to be thinking about." And to me the exciting part of being a SaaS architect is, it's sort of now my job to go out to the business and ask all the right questions to figure out which flavors of these different things fit for my particular solution, right? So for me part of the problem here is knowing the patterns. The other part of the problem is knowing the right questions to ask. Where are we out of this business now? How much of a hurry we're in? Are we migrating? Well, there's a whole bunch of questions I might have around. What are migration strategies gonna have, how are we segmenting our customers? What might be the value boundaries between these different customers and what they might want and not want? Do we have specific domain requirements or compliance or SLA kind of issues that we have to think about in this space? Like literally 40 questions here and almost none of them are technology questions but they have a huge impact on the patterns that I end up choosing. So to me, you kinda have to embrace this, as a SaaS architect, you have to say, "This is the goodness of what I do because- And I'm good at stitching that all together for you and I'll help you figure it out." And as a SaaS architect, it's my job to like present these options to the business to let them know what they can or can't do in their environment and give them new options even potentially that they're not thinking about. Okay, now at the outermost level of any sort of discussion of patterns for SaaS is this core concept of what I've called "the two halves of SaaS". And we started talking about this about two years. We've been doing it for a while. In fact, gentlemen was up here talking to me about this for a while, and we finally started branding it, which is to say, there's a control plane piece of your SaaS environment. When this control plane concept gets used a lot in the tech industry, and here for SaaS, control plane really represents what are all those sort of horizontal, shared services I have to have, billing, onboarding, metrics and analytics. What are all those things that help me manage and operate and scale my multi-tenant environment? None of the application lives here. The only thing that lives here are all those things that are common to all of your tenants. In fact, the control plane itself, not multi-tenant, it's just one set of services managing all of your multi-tenant and the bits. But breaking this out and having this live on its own ends up being quite valuable. It gets- Can be managed separately, it can be deployed separately, version separately. It also identifies one of the key issues which is, a lot of people don't think about those control plane services. They don't start there. They start somewhere else and then they're like, "Oh, yeah, guess we'll do that, and we'll go build one of those, we'll go build one of those." But they don't end up really ever building a thinking about it like a control plane. The other half of this is what I call the application plane. The application plane and a lot of- Control plane and data plane are the two words that you'll hear all together all the time. I didn't like calling this a data plane. I didn't feel like this was just about data. This is about the nature of my SaaS application. Here's where all the multi-tenant, sort of capabilities of my system are. This is where my microservices are. This is where I'm gonna implement isolation to separate tenant resources. This is where I'm gonna do data partitioning. This is where all the crazy deployment models come up. What are the different ways my tenant workloads are gonna be here? And of course we have to have connectivity between these two, right? We have to decide how the control plane's going to configure and set up and operate the application plane and provision resources in that plane. And then the application plane has to surface all these metrics and analytics and all the billing data and all the other stuff back to the control plane. But if you said to me like, "I'm gonna go tomorrow and build a brand new SaaS application, where would I start?" I'd say, "Start with your control plane." Like you don't have to go build the entire control plane but start with a control plane. Start with how you're gonna get a tenant introduced into environment. How you gonna do onboarding, how's identity gonna work? Build those core control plane pieces and build this foundation of this control plane application plane concept. So- And the nice part of this is, this applies to every solution. This isn't- This is one of those areas where at the conceptual level I could use any technology in these two paths. I would also say that by having a separate control plane and an app plane, it gets people more comfortable with the idea that I might pick a different technology for the control plane than I might for the app plane. Like a lot of see people now leaning into serverless for the control plane because they like the scaling, the "pay for you consume" and they think the control plane isn't- All the pieces of the control plane aren't running all the time. Great place for that, but they might run containers in their app plane for example. Now, just to drive home this point of control plane versus app plane just 'cause it's one of those soapbox kind of topics for me. I will say that if people go away and they go in, they build the app plane first, right? And they say, "Wait, we've got this awesome app plane, got all these great multi-tenant microservices, we've got isolation, we've done all these things." And they'll say- And I'll say to them, "That's awesome, you did a great job. You're not SaaS. You're a great- You're a multi-tenant application but you're not SaaS. You can't function and operate and run as a SaaS business until you have a control plane." The control plane is the thing that you're really going after when you're building a SaaS business, which is, it's gonna be the center of scale and agility and innovation. It's the thing that lets you move fast and be more nimble and sort of have insights into everything that's going on. So it's a subtle point but it's an important one and it's a tool that makes it clear why the control plane and the app plane are here. Now... And also think that it's really important if you're gonna talk about patterns here, to be really clear about one sort of terminology problem that I've run into for a very long time across the SaaS space, the notion of "multi-tenant" is this word just like "multi-tenant", everybody uses it, but multi-tenant, that term has all kinds of baggage attached to it because like at AWS, there's tons of services that are running as multi-tenant services, right? They- Because they share infrastructure and that notion of shared infrastructure means you're multi-tenant. But if I just run my solution on a multi-tenant piece of infrastructure, I'm not SaaS at all, I'm just sharing the infrastructure. And so we tend to map SaaS to multi-tenant and then we map multi-tenant to mean, SaaS has to mean you're sharing infrastructure. SaaS does not have to mean you're sharing infrastructure. In fact, lots of solutions have dedicated infrastructure for tenants. They're still running the exact same software for everybody but some piece of the system might be dedicated or not. So we can't just use that word "multi-tenant" to describe SaaS. So instead, we need a new terminology that's crossed all these patterns, and certainly one of the words we introduced was "silo". And so if you look across our content, you'll see this word "silo", and by the way you'll see the word "pool" I'm gonna mention in a minute here. And silo means a resource is dedicated to a tenant. In this case an entire stack is dedicated tenant. But the key takeaway here is that siloed stack doesn't mean, "Oh, and each tenant's kind of run on their own version and we're all doing little customizations form." No, that stack is managed, operated, deployed, it's- Everybody gets the same thing, it's just dedicated. Now, on the pool side is the classic notion of this, everything's shared, right? A resource is shared by tenants, it's considered a pooled resource, right? And for me, using silo and pool gets me away from the word "multi-tenant". Here I just describe it, any SaaS environment, if it's built the right way, it's multi-tenant. And then under the hood of that solution I'm determining, are- You know, is the- Does this resource need to be shared, dedicated or whatever I have to do? And I describe those resources as being siloed or pooled. And of course I include bridge here, and bridge is just here to indicate the fact that some things could be siloed, some are pooled. That word doesn't show up a lot because really we're just gonna focus on which resources are siloed, which are pooled. But hopefully even though that's a bit of a- Sort of sidetrack, you can see how important that is to how we talk about SaaS and we describe patterns. So, and it gets us away from this word "multi-tenant". In fact, you'll never hear me say "single-tenant", like a lot of people describe systems as single-tenant. I don't think we need the word, silo and pool kind of covers it for me and then I describe my whole environment as multi-tenant using siloed and pooled resources. Okay, now we are done with that outer edge, let's actually talk about what's happening inside that control plane and what are some of the patterns inside the control plane. And I'm gonna start where I would start with any environment which is, I would talk about onboarding because onboarding is at the center of like bringing all of multitenancy to life inside of your control plane. It orchestrates lots of the moving parts of your control plane, and the entry point to onboarding, and I'm showing these as Lambda, they could be any technology here that you want. In fact, if you look at our reference architectures, you'll see this pattern repeated in a slightly different variations across those reference architectures. I come in and I wanna onboard a tenant. That could be some internal tool I use to say, "Go onboard a tenant." It could be some self-service thing where a tenant signed up. I don't really care if it's B2C or B2B, it doesn't really matter. When a new tenant comes in, they hit this registration service and I need a separate service that coordinates and orchestrates this 'cause as you'll see, there's lots of moving parts to this and I need that registration service to keep track of the state and the, "Are you succeeding? Are you failing? Where are we at in that process?" The first thing we're gonna do from that registration service is create a tenant. That's first, the most basic thing. Like give them a global unique identifier, give them a- Get their name, figure out what tier they are, basic sort of operations. But as we go forward, you'll find out, during onboarding, we also configure lots of more data in here, typically. Routing, identity configuration, lots of other- This is the central configuration of everything that you had, policies and everything you have about a tenant. And it needs to be its own service. We also have to create a user. We got a brand new user coming in, that user's gonna (indistinct). This first user's usually what we call the tenant administrator. They're coming in, they've gotta get created as an identity in our system and this is where you've gotta create some binding to an identity system. Ping, Auth0, whatever. I'm gonna show you Cognito here, but the themes that are here will apply to any one of these identity providers, right? Which is, somewhere in here, I've gotta pick some storage, like here I've picked user pools. That's a Cognito way of just saying, "Here's a way to group tenants and configure tenants to have separate authentication experiences." Happens to work good for SaaS, but if you go to Okta, Ping, they have their own sort of grouping constructs as well. And the most important thing in here is custom claims. Very simple concept, property value, put inside your identity. But I'm gonna shove in that property value, your tenant ID, your tenant role, your tenant tier at a minimum, into that. I might put a few other things about your tenant in there as well, and by putting those into the custom claims, you'll see as we get further downstream here, now my authentication experience is not just authenticating its user, it's giving me back tenant context. When I authenticate, it gives me a token back that I can use downstream and it'll have a huge impact on the way that your microservices and your isolation and your data partitioning get implemented. In fact, I have a migration talk and I do- Where I do and I just say like, "Please, get this piece right because if you get this right and you flow that context through the rest of your system right, it's gonna make the rest of the build of the system easier." You might also have isolation policies that need to get built here. Cool, that timer is wrong. I'm just checking the timer here. It's going the opposite direction instead of counting down. So it threw me off. No, we're all good. I thought I had a lot less time left than I do. (chuckling) So as part of this process, the- Well, create these isolation policies that need to create it and this will connect back to the isolation patterns we'll look at later. And then I also have to connect to billing somewhere. We're gonna bill this somebody somehow here, maybe freemium, we don't do this, but in general, we're gonna build somebody here and we've gotta go out and connect this to a billing provider. And then the final step of this is this thing I mentioned before which is provisioning. And we're gonna get into this in more detail, but the idea here is, any infrastructure that needs to be created on a per tenant basis, or configured on a per tenant basis, is also part of the the onboarding here. The key part of this whole thing is, like, you can see there's lots of moving parts that once this is all up, I know how to build my customer, I've provisioned any infrastructure they have, I've set up any policies or routing as part of their tenant management, I've got identity all working for them. Like, when you've achieved this, you've made a huge step inside of your control plane. And by the way, just go back here real quick. These services, billing management, these are all control plane services running inside that environment. Microservices in that environment. Now, if you look at this provisioning experience and we say what's happening inside this provisioning experience and we think about this being tier-based, meaning like, I got basic advanced premium tiers and I offer them different experiences, that provisioning piece of my onboarding process is gonna have a lot more to do. So here, if I come in and I register as a tenant and I hit that registration service, yes, I'm gonna go do all that stuff I said I was gonna do, but I'm also going to go out and hit that provisioning service and provision a bunch of dedicated infrastructure. And the reason I highlight this is because to me, it turns the DevOps story a little bit on its head. We're used to think of CI/CD, a developer checks something in, we build something, then the infrastructure automation runs, provisions the environment, we're done with DevOps until the next configuration change or the next deployment. Well, you're gonna still do all that, but now what happens when your system's up and running, you're at runtime and somebody onboards, that onboarding is at runtime going to provision even more infrastructure dynamically. And this, if you're self-service onboarding, you're gonna have people signing up and triggering all this DevOps onboarding. So now you have runtime triggered DevOps that may be triggered by a source that's not you. That's sort of unusual to people. And you have to think about, "Well, now if I wanna go build that onboarding", the bar for how it works, the robustness of it, the efficacy of it become way higher than it would be normally, and tends to be a lot of code there because of that. And just to see this as an example, imagine I had an environment with basic tier tenants and that basic tier tenant was running in a pooled model. So sharing infrastructure. And so when I provision this environment, I provisioned everything. I'm just showing two microservices, you can imagine VPCs and availability zones and all the other things that would have to be here. But these are provisioned once and now my tenants as they come on board, if I onboard another tenant, it's more just about configuring them to be able to consume this shared resource because there's- And there might be a little bit of network routing I have to configure, but I'm not like redeploying anything major in the sequence. But I also have a premium tier tenant. So I call them platinum tier here, a platinum tier tenant, and this platinum tier tenant has siloed resources. I've decided they're gonna get their own copy of these order and product services. Same version as everybody else, but their own copy. Might have done this for a noisy neighbor reason, might have done it for a compliance reason, I might have done it just because these platinum tier tenants demand that these service run in a silo. There's all kinds of reasons I might do it, but they're here. But now that means when I onboard another platinum tier tenant, I'm kicking off a- You know, here in this example, a code pipeline and a ton of code here to get all of these bits sort of brought up and running every single time a tenant onboards. And very overlooked to me, overlooked area of of SaaS sort of built here, this whole sort of SaaS and DevOps story. In fact we had a chalk talk on that here. I think there's another version of that chalk talk, gonna run. The other piece I wanna talk about is this notion of tenant-aware identity, right? We have to- When we talk about identity in SaaS environments, it isn't just, "Yes, I've got identity to the user." Like it said earlier, we gotta connect this to tenant context. So we, if we look at a flow here for tenant-aware identity, hit a web app, that web app classic sort of redirects the identity provider, identity provider returns some tokens. This drawing could be a drawing for any application here that's authenticating. There's nothing special about this. What's special about this though is that these tokens that are coming back, and this is just OIDC, these tokens are coming back with the extra tenant context that we put in them when we put them in the custom claims in the onboarding step. So now with that in there, and I pass this token through and this is just classic sort of implementation, I pass this token through and it goes through to the first application service. Now all that tenant context is right there at the fingertips of my application. They don't have to go somewhere else to get that context. It's all embedded in the way that I shared the authorization and authentication sort of information with that service. I've seen and I built it myself. I've seen people build a centralized service that like every time you're in an app service go get that context, and then that service becomes a huge bottleneck for everything. And then you add cashing to that and you- Like it just becomes this huge thing. You wanna pass as much of that tenant context directly into these app services as you can. And then now, as it calls other services downstream, it just passes that same token through and that token's now available to any downstream service. So we're really- This is all about- Like, these are really simple concepts but you can imagine how much simpler they make building a multi-tenant SaaS environment if you get them all right. Just to give you a sense of what that might look like in a flow, this is straight out of our EKS SaaS reference architecture. In this case we happen to use subdomain as a way to identify tenants, that's just a friendly name of the tenant, it's not their tenant identifier. And they come in, they hit that subdomain. And now, because we're using Cognito user pools, each tenant has their own user pool in the model we use, and we'll talk about the trade offs of a shared user pool versus standalone user pools. I have to know which user pool to authenticate you against. So I talked earlier about the fact that that tenant management service in your control plane would have some other data. Well, guess what? We shoved your user pool data into that as well so that I could resolve you as a tenant, first of all from your origin coming from that subdomain to what tenant identifier you are, and then from a tenant identifier I could say, "Which user pool is the user pool I need to authenticate you against?" App ID here, it's another remnant of Cognito that you need for authentication. But now that I have that piece of information, I can share that back with the app. Just happened in this case, have a little authorization library that sort of cache and holds that authorization information to make this a little simpler. And now I can actually go and hit that user pool that is your user pool. Imagine there's, you know, a hundred user pools, a hundred tenants I've- This is hitting the one user pool that is yours, that returns code, that codes get exchanged for JWT, straight OAuth flow here, nothing magical about this. And then that JWT gets injected downstream to a service. This could have been any service in our system. So pretty straightforward process. The solely downside of this is that that tenant management needs to sit in the middle of this OAuth flow. I don't really like that a ton, but is the trade off of this user pool per tenant model. Now, I did talk about the fact that you could have multiple user pools, or- I get this question all the time from AWS people, like, "Hey, I'm doing SaaS, like should I use multiple user pools or should I use one global user pool? What are the pros and cons?" And my answer always is, "It depends." So I did create an "it depends" slides for you because it- Here are the sort of criteria that we use to sort of decide which way to go. Obviously, if I do user pool per tenant, I get to have separate authentication policies attached to each user pool. So imagine saying, "Do I want MFA or not?" Like, "What's password expiration? What's the password policies themselves?" I can offer that to individual tenants and I can even surface that in my SaaS application and say, "You turn on whether you want MFA or you don't want MFA." And I can actually make that a feature I almost sell to the customer to say, "Hey, by, if you're in our advanced tier, we'll let you customize your authentication experience and use that as a feature, and it's just a feature straight out of the user pool for me." The downside of this is of course, anytime we do anything per tenant, there's the scaling question, right? Like, do you really wanna have 10,000 user pools? Is that a logical way to do this and are you gonna hit limits of AWS? There are limits at which you can have- The number of user pools are constrained here, right? So you can't just say, "I want user pool per tenant", and not think about that scale question as part of this. The other part of this is the part you saw in that authentication flow which is like, "Now I have to resolve which user pool you are going to map to", which makes, puts some service of my own in between the user pool and the authentication process. So now I have a point of failure in that process that I kind of wish wasn't there. The other alternative here is to say, "All users get one user pool, will use groups or other things to identify which tenant they belong to." But now I just use the scale of Cognito to like hold however many tenants I need to hold, user- Like, user pools can hold, I don't know what the upper limit is but lots of tenants so you're probably gonna be okay there. But now because I've done that, the policies are shared, whatever my MFA policy is, it's everybody's policy. I can't make that a per tenant kind of decision. But I do get here, like, I don't have to resolve users to- Sorry, tenants to individual user pools so that OAuth flow is a little better and some of those bits work better. So, always useful if you're thinking about that. And even if you're not using Cognito, I would bet, if you looked at Ping and Auth0 and Okta and everybody else, they have some sort of set of trade offs around this as well that you'd wanna consider. Another piece of the control plane that's really not all that exotic is user management but it- But the reason I include this slide is because there's two flavors of user management inside the control plane, and sometimes people miss that nuance, which is, yes, the tenant one is the one we've talked about. You get in, you get a tenant user, we create that user, by the way, then the tenants can create additional users via that user management in your system. But the person that, or the role that really gets left out of this is the system admin. That's you, the SaaS provider. Like we are the SaaS provider, we have admin people who need to log in and get into our console and see all tenants, who tenants are active, which tenants are active, what metrics and analytics. Like, you're gonna have a custom dashboard, you're gonna OAuth into that dashboard and you have to figure out where does- What's the identity story for that? Is it same thing as your others, we're gonna create a user pool that is a separate user pool for our admin users. Like, you might have a different sort of set of criteria for what that admin experience ought to be. But make sure you realize that these are two entirely separate identity sort of roles within your control plane. Now, I also said as part of onboarding in the control plane, we have to set up the billing as part of this and it's really hard to do patterns around billing because some people have internal billing providers, some people use external ones. In general, like I try to steer people towards a third party billing solution, just 'cause if you can avoid building that yourself, great. But I understand in some corporations, like there's- It's part of some bigger enterprise story and you have to sort of work with that. But I've stuck with the third party sort of model here. It's great. Lots of good examples of partner solutions that are out there that provide billing. And really the flow of this is pretty straightforward. You've gotta go to that provider and set up your account. And the biggest part of this is, you gotta configure what plans you have. Like every single one of them have very different ways of describing how you're gonna bill and what the billable unit is and how are you- Is it consumption and what is it? You have to configure all those plans ahead of time. Those plans typically then correlate to some, the tiers of your tenants. And once I have that set up, now back in my application, during that onboarding process, you'll see that during the onboarding process I have to essentially go create that customer in that environment. So back, all the way back to onboarding when I said, "You're hitting this billing experience", you're gonna have some billing service inside of your control plane that implements this integration, and during onboarding it's gotta create the customer in that provider. And then once that's all set up, now we're just down to, what's the billable unit you're tracking on? Like are you tracking consumption or features or number of users, and you have to instrument your application to send that activity data to the billing provider and then once they get the activity, they can generate the bill for you. Now, sometimes people will put an aggregation service between themselves and the billing provider because they are- Well, they wanna sort of aggregate our pre aggregate or even capture some of that billing data on their own and then decide, "Once we've aggregated it to a certain level, we'll then send it to a billing provider as a separate step." It's kind of your choice how you wanna do that and how- It depends how much data you're sending to that billing provider as well. Now, the other I said was metrics is in your control plane. It's just way too big of a topic to really like us go way into this pattern. But just know that in your control plane, I'm expecting your app builders inside your SaaS solution to be instrumenting all across that solution with activity and events that are meaningful to your business, that are partly meaningful to product owners, partly meaningful to executives and meaning to operations and to architects. People like, how are people- How are these tenants consuming the resources of our system? How is it scaling? Which features are they using? How do those features and their consumption of those features correlate to events and activity and scale within our system? It's just an area that is way underinvested in SaaS environments. The good news is, like the aggregation of that data is the easy part of this story. There's so many tools that are just ready off the shelf. I just put like Redshift and Firehose here. I put an Elasticsearch stack. There's all kinds of good tools that know how to ingest that data, warehouse it and let you do good analytics against it. The harder part of this problem is, you've gotta go out and actually get that data. Some sources or system events, like you can just go get CloudWatch and like AWS is gonna give you all kinds of good data that you can capture here. But the harder part of this is the, you've gotta go get your specific application events, your- From your domain that are meaningful and represent meaningful metrics for your environment. And that means you gotta get builders to invest and be willing to surface this data. And usually, if you can get enough momentum and enough people start instrumenting this data, it creates a flywheel, then people start asking for it and it actually starts getting in backlogs and it starts landing in places where people care enough about it that it shows up. Awesome. So that is the control plane in a very quick pass through it. But you can kind of see all the different roles at a high level that happen in patterns that happen inside the control plane. Now let's go over to the application plane, right? Let's go over there where we actually see like, what does it mean to write multi-tenant application services? 'Cause so far most of what we did on this control plane was enabling multi-tenancy, but it wasn't actually about applying multitenancy. And everything in the application plane to me starts with deployment models. When you're sitting down to decide what it means to build your SaaS application, you have to figure out what flavors of tenants you wanna support. And don't just assume, everybody's gonna be pooled. We're all pooling all the resources for all the tenants and that will be our model and we're done. If you really look and talk to your sort of business and you find out what they do, you're almost always gonna find like, well, parts of the system for some tenants need to be siloed for certain things or- And some tenants are gonna actually require entirely dedicated stacks based on the amount they're willing to pay and what their expectations are. You would like to tease out as much of that as you can. You won't get it all on day one. But the more effort you put into sort of teasing that information out, the more you can start to point your architecture in the right direction and start to pre-think about the deployment models. But I would also say, I would expect as an architect to build an approach where if something's siloed today, it might be pooled tomorrow, or pooled today, it might be siloed tomorrow. So how can I give myself the best way to sort of move the parts around to achieve what I want? But at the simplest level, you know, we need to have a pattern that we call full stack silo. Full stack silo is just- Everything's dedicated, tenants entirely get their own resources. And the other simple version of this is full stack pool, like, these where you could sort of see the foreshadowing of these coming already, which is, you know- And again, both multi-tenant environments. So don't presume that the one on the left- People will, this is where that word, that ugly word, "single-tenant" will come in that makes me sort of a little nervous, right? Like I don't- That isn't a still, that is still a multi-tenant environment. I'm still operating it all with a control plane and so on. Now, the interesting thing is, people will see themselves as one of those or the other, where, "Oh, we're a full stack silo business. That's what we do. All customers are full stack silo." No. Like, you don't wanna be in the business of being full stack silo for everybody 'cause you're not gonna achieve all the goodness of SaaS probably now. Now, if you only have 10 tenants, it's still maybe an okay way to go. But if you're trying to scale the business, everybody full stack silos probably not gonna be a great way to go. So what we usually see people doing is offering the sort of tiering starts to come in. Premium tier tech tenants are gonna write a big enough check. We're gonna let some of them be full stack silo, but then we're gonna run everybody else in basic tier tenants. And guess what? This is where that whole provisioning story I said gets exotic. Imagine you are supporting both of these. Well, now all of your provisioning, all of your routing, all the things you're doing during the setup of an onboarding of a tenant and that control plane has to enable this to be done all at runtime. "Oh, I need to provision a new environment for this premium tier tenant. I've gotta go set all that up." A lot of heavy lifting to do there. Now we just looked real quickly at a couple examples of this. At AWS, account per tenant is often a full stack silo model that you'll see, and it's an entirely a valid model, but it comes with lots of caveats, like account limits, some can't be configured and set up in a fully automated way. The number of accounts you're gonna use become an issue. Cross account access is there. So I'm not trying to discourage anybody from doing this, but I would want you to pick it with a very carefully chosen set of parameters, and don't presume that we can, "Yeah, we're account per tenant, it's gonna work for everything we wanna do." They're absolutely caveats. Now, the other variation here is VPC per tenant. At least this puts me now all in one account, this gives me a little better way to manage. I don't have to worry about cross-account access here and I still get all the goodness of silo. Yes, an account is maybe in more absolute notion of silo, but I, with a VPC, I certainly can put pretty hard boundaries around tenant environments and be pretty comfortable. And of course, just to beat a dead horse here a little bit, they still all use a control plane. They all get the same version, they all get deployed the same, through the same mechanism, they all get operated system. And that to me is what keeps them multi-tenant. Where nobody- If somebody tomorrow comes and says, "Yeah, we're doing account per tenant, this one customer, we're gonna do this one thing, we're gonna..." Or they're not going to the latest version or like... That's where this all starts to fall down if you start to head down that path, and it's tempting for the business to do that. Yeah, full stack pool, really straightforward. VPC, put all my tenants in the same VPC, no big deal. They're all just sharing these resources. We'll get into what it means though when they share those resources 'cause that's where it gets more exotic. I can then also go to other technologies like Lambda or anybody else, and just like in this case, put all my tenants in one cluster and run them in a cluster for EKS or in Lambda, just have a set of shared Lambda functions that are all running in a multi-tenant model. The pooled story is a much more straightforward model. But the model I like to advocate here and the one I sort of, I'm pushing hard for people to think about, 'cause if you remember at the beginning of this, I said, "Give yourself options, be prepared to move, be nimble in this space around silo and pool." I really want you to think about this notion of mix mode deployment, which is to say, "On a case by case basis, I'm gonna look at all of the resources being consumed by a microservice and make silo and pool decisions based on the things that are best for that microservice." So right here I've got an order of microservice, and for some reason I've said like noisy neighbors, something made me say, the compute is gonna need to be separate for these for some reason, but the storage, turns out, is fine, it can be pooled here, right? And now I'm getting the best of both worlds. I'm at a really fine grain way picking silo and pool. And I might be doing this not even for a business reason. I might be doing this for an operations reason inside, like, "Orders just always getting hammered, we're having trouble scaling it, we're gonna silo order just for our own survival." Or there might be a business reason to do this. Now, product service has pool and pool, we've decided there's nothing here that is a problem. But now over for my invoice service, I'm siloed- I'm sorry, pooled for the compute. But something about the data in invoice said I need to silo the data for these, for the invoice service. And then I put queues in here. Just say, "Hey, the invoice is going out." That means we're shipping or whatever. I only put in queues in here to make the point that the silo pool decision isn't just limited to storage and compute. Like, I gotta decide for queues as the queue- Should the queue be siloed, pooled, what might make me choose one way or the other. And then just to another service. So the big takeaway of this is, this whole silo-pool story isn't just full stack pools, full stack silo, it's all the way down to the microservice level that you're making these silo pool decisions. And you might be migrating between them. Like today this is siloed, tomorrow it's pooled, vice versa. And that migration, which I got asked about a lot, is not easy sometimes. So I'm not thinking like these are just switches you throw and it all happens, but this at least gives you and the business more knobs and dials inside your architecture. Now, one of the model I wanted to call out here is what I call pod-based deployment. I see more and more customers doing pod-based deployment. So they'll create a pool and they'll create a pod for that pool and then they'll provision certain tenants into that pod, and they'll have- They'll sort of break these tenants and they'll use the pod as a unit of scale. They like the fact that the blast radius of this is, and the deployment radius is contained to the pod. So this is sometimes appealing to them. It does mean that onboarding is more complicated. It does mean that management operations gets a little more complicated, but still it's a pretty compelling story. This gets, really shows up a lot in multi-region environments where somebody will do pod-based deployment and now when we wanna go to another region, that region will just have that pod running in that region, but we're still centrally managing and operating it. Now, I had to pick a stack to show some of this coming to life. The problem is that would be an hour-long presentation if I show it to you for serverless and I show it to you for ECS and for EKS and for EC2 it would be tough. I picked EKS, no bias here, just one I did. But here now, the point is that on each one of these stacks, what it means to implement silo or pool, looks a little different. And there, even there are multiple variations of it. So in this case, like if I want silo inside of EKS, I could use an entire cluster for every tenant. Kind of a heavyweight solution, but it's certainly an option. Another model for this, which is probably more common and more popular, and certainly in our reference architecture, is this idea of a namespace per tenant. So same old Multi-AZ, VPCs, everything that we normally have for good, for just a good HA environment here. But now within that cluster I have a separate tenant namespace for each tenant. So they're running in a siloed model, but they're still in the same EKS cluster, but they're entirely separate deployments. And now I use namespaces and other EKS constructs to make the boundary, to enforce the boundary between these different compute resources. Now, pool is the simple one always in this story. I just put the services into the pool model. They're all running the same cluster and they're pooled. And again, the story with pooled is always, always what's happened inside the microservice. Whereas silo, it's like, "How do I get you to the right silo with pool?" Everybody's coming into one spot. So it tends to be a lot more straightforward. Okay, so that's kind of deployment models. Now let's go a little further in and say, "Great, we're inside here, we've deployed, but how do we implement isolation?" And this is across everybody internally, externally at a top two. This is where deployment model and isolation get confused. People saying, "You're full stack silo, that's your isolation model." Full stack silo is not an isolation model. Full stack silo is a deployment model. It describes the footprint of it, but I could have two full stack silos running, but one could still potentially cross a boundary and talk to the other one. So there the- You have to think of isolation as the extra layer you put onto whichever deployment model you have to ensure that one tenant can't access the resources of another. Yes, account per tenant is gonna be an easier isolation model than a pooled compute resource. So they're harder, but you wanna separate isolation out of that. And there's some really common isolation patterns. One is the easiest one here, which is what I just call full stack isolation. If I've got an entire stack, if I'm in a VPC, I'm just gonna control ingress, egress. Really straightforward, our account is already as a boundary of the account is, becomes the boundary here. Much more straightforward model. Now as we go more fine-grained here, I also have what I call resource level isolation. Now, I have certain resources inside of my environment, a database, a DynamoDB table, a queue or something that is dedicated to a tenant. Well, now I can- I have to describe isolation at that resource boundary. And the good news is on that side, usually IAM and other tools will let me describe the isolation footprint of a resource as well. Where it gets hardest- Oops, sorry, I forgot to draw that animation around resource. Where it gets hardest is when the tenants sort of- Data or resources are inside the resource itself. So if I'm in- The easiest example of this is, if I'm in a pooled resource, like a database, and the items in the database are all one, side by side co-mingled with one another, now what's my unit of isolation? It has to be an item level use unit of isolation. And this is where sometimes we have great answers and sometimes you are gonna have to come up with the answers here, but you still have to have isolation here. But the more things are pooled, the more things sit side by side with one another, the harder it gets to figure out how to isolate them and how to implement a good isolation strategy. Now, just quickly looking at resource level isolation, you can sort of see that isolation boundary. I've got compute, I've got storage, a tenant's consuming it. Really sort of natural boundary around that. But where that resource level isolation gets interesting as it plays out on top of AWS is, I have very specific tools I can use to implement that resource isolation that work out really nice. So for example, here in this compute model where I'm running EC2, and I'm deploying these EC2 instances, when I deploy an EC2 instance, I can attach an instance profile, and that instance profile has a set of IAM policies. And because I know these are siloed and I know they belong to tenant one, I can attach an instant profile that says exactly what tenant one can touch and what they can't touch. So when tenant one goes to touch their data, they're constrained to only things that belong to them. Where this gets more concrete is, imagine I spin up a tenant two environment and now tenant one tries to access tenant one, two. I don't care what's running in the code, I don't care what the developers have written in the (indistinct). The instance profile is gonna prevent any cross-cost access. And that to me is the best case scenario for isolation because somebody else is ensuring that that's enforced and developers can't just cross that boundary. This includes trying to get to data or some other resource that may not be there. So this is, again, all deployment time though, right? When I deployed these, they get attached there. Same thing applies to Lambda. When I deploy Lambda, I attach an execution role, that execution role constrains at scope for the scope of that- For the life of that Lambda. And that prevents cross-tenant access. So if you're doing siloed, nice story, works great. The problem is we also wanna do pool. And we wanna do pool a lot when we can. So now we have to think about item level isolation. And imagine I've got tenant one, tenant two here, and they're trying to access some table and that table has co-mingled data inside of it. Well, what do I do here now? And this is where we get what's called runtime-enforced isolation. Runtime-enforced isolation basically says, "Your code, your builders, your frameworks, your libraries, whatever you're doing, it's gonna have to play a role in figuring out how to apply isolation." So here, if I look at, three tenants are coming into an environment that is a pooled environment and they're gonna access some pooled data, and I have some compute running for these. The compute that's running for these, now it can't be deployed with an instance profile that constraints it down to an individual tenant. It has to be deployed with a profile that allows it to touch any tenant's data, right? In this case, at least tenants one through three. 'Cause at any moment in time this compute could be processing data from any one of those tenants. So now the code running in that compute has to go out and acquire a tenant context. I went one step ahead, sorry about that. It has to go out to Cognito, and this case is the example. It's gonna go out and look at, we have this thing called the token vending machine. If you've ever seen it, go check it out. That goes out, takes the tenant context, says, "Oh, you're tenant two, I'm processing the request for tenant two. Here's a set of- Here's an IAM policy for tenant two, assume role for that, give me credentials back." And now those credentials are used to access resources. Now that- And those credentials will prevent you from crossing a boundary to another tenant. Awesome, but requires compliance of tenants. And this is where we say, build libraries, build mechanisms, depends on which stack you're in. Serverless, we use layers and other tools, we'll use like wrappers and so on around things. I wanna try to sit around this in a way that the developer just gets these credentials and almost doesn't know they're there. Sidecars is an example where we're doing this. We have a EKS workshop here where sidecars and Mesh is being used to inject these credentials. So you're gonna be creative there if you can. And then if you just look at how this plays out, we chose Dynamo 'cause Dynamo has an easy way to do this, but just know that this isn't- Your mileage may vary as you move service to service. But here you can see I have a policy, that policy constrains me to tenant one for this example. And now when I try to go access data, I'll be constrained in DynamoDB table to only item one data. The other thing I'm gonna look at here is, inside these microservices is how we throttle experiences, right? We also wanna be sure that if we've got these tiered experiences, like the advanced tier tenant isn't somehow being affected by the basic tenant. And what do we do here? Well, here, the API Gateway's our friend, we'll use the API Gateway. You could put other gateways here and do interesting things with throttling as well. But in this, with this API Gateway, essentially it lets me attach API keys to tiers so I can have a basic advance premium, whatever tiers. And for each one of those keys, I can define a usage plan. And that usage plan actually defines what are the throttling attributes for that particular tier. So now when a call comes into the API Gateway, I can go out to the Lambda authorizer, Lambda authorizer's just some code you can configure inside of the API Gateway to be processed each request as it comes in, extract your tenant token, and then apply that usage plan... Oops, went too one, too many. Apply that usage plan to throttle the downstream experience here. And there's really good examples of this and our serverless SaaS workshop does an awesome job with this if you wanna see how this works. But the key here is, obviously you never just wanna let, especially in a multi-tenant environment, let everybody just consume whatever they wanna consume. And then I also said that we wanted to make the implementation of these microservices. And we wanna sort of take the multi-tenant detail away from- Like, I don't want developers writing like, "How do I log with multi-tenant context?" Or, "How do I go get credentials from the token vending machine?" I wanna write the libraries and all the other bits to standardize how all that works. So that builders are mostly just focusing most of their time on writing the functionality of their code and relying on these libraries to deal with a lot of the multi-tenant complexity. I've gotta- This is a serverless example that happens to use Lambda layers. Lambda layers just let you have a separate library that gets pulled into your Lambda. It's just a way to sort of deploy a set of shared libraries. If this were EKS, this could be modules or, you know, any other sort of shareable library constructs you want. But here are these two services, have to do things like log, they have to record metrics, they have to get tenant context to go get data, they have to do all kinds of interesting things here. So I put all kinds of helpers into these layers to like, "Oh, you wanna log?" The order service just says "log" and it passes that tenant context in and it's the job of the logging manager to get the tenant out of the JWT token, figure out what it is, figure out how it should be injected into the log stream so that our logs automatically have tenant context in them. All right? That's much better putting that code there than putting it in everybody's code to go, "Go get the token, go crack the token open, go get the tenant ID." I just wanna take all those basic, repeatable steps out of this. And this is especially valuable for that isolation story where the token vending machine is going out and doing this. If I can put that token vending machine in a library and somebody can get that tenant context easily, super helpful. Okay, the last category we wanna talk about here is data partitioning. And with data partitioning, this silo and pool story, once again, comes forward with us, right? It comes forward in almost all of these contexts, which is, we're gonna store data and we're definitely gonna make silo and pool decisions about data. Compliance might make us silo data, noisy neighbor might make us silo data, other data might naturally pool. I generally will try to pool data if I can first. I'll also try to sometimes stay with schemaless data if I can over there 'cause migration can be a little bit easier for me in that space. But ultimately, actually how you implement silo and pool is wildly different across every sort of storage technology, all right? If you're doing silo in RDS versus silo in DynamoDB versus silo in Redshift, those are like three blog posts from our team to tell you like what silo look like in each one of those. And by the way, you might have three options for silo to choose from. Depend- And it's one technology, you might have five in another. So the hard part of this talk is, I can't really go into the depth of like every single one of those storage technologies to tell you how silo and pool play out on those technologies. But I will tell you like the overriding sort of themes which are noisy neighbor, performance, isolation, are all sort of part of the reasons that might drive you towards more than one of these versus the other. Also, migration should be part of that story. Like, as you're deploying new versions of code and new capabilities that require updates to the data, you wanna think about how easily will that data change and morph as new features are pushed out. Now, I did RDS, just a simple example here, but I- You can already guess the, you know, separate database or a separate instance is gonna be my siloed story here. Pooled model, I'm gonna use a foreign key and I'm gonna put the tenant identifier in the foreign key, and I've got pooled here. Nothing super exotic. But just for wild contrast, I wanted to go to one something that wasn't extremely on the other end. Well, now let's see what it means to partition data in S3. And there's a whole another blog post on partitioning SaaS data with S3 that talks about, should you use a bucket per tenant? When should you use a prefix per tenant as a strategy here? Why might you use tags as this, or even endpoints as a strategy? So these are just one sort of flavor of the factors here. And again, at scale is in this equation, even per service sort of strategy. So data partitioning is a huge part of the pattern story, but it is highly variable based on the individual services that you consume. The thing I do wanna do on the data side of this though is, be sure that as you're thinking about data partitioning, you come back to this service by service mentality I talked about earlier. When I said, "Make your silo pool choices based on like the realities of your business, the realities of an individual service." I want you to make your data partitioning decisions the same way. Like, I've seen some organizations who are like, "We thought hard about our data partitioning and we've settled on like RDS, we're all RDS. So every microservice is RDS." And I'm like, "No." Like, "You should, you might be choosing RDS for one service, you might be choosing S3 for another. You might be choosing S3 and DynamoDB for another one." You wanna look at each individual service because we are building microservices and the data's supposed to be autonomous, I should be able to pick whatever storage is the right storage for that particular microservice. Okay, a few takeaways here. Hopefully it was very clear from the beginning that like, there's no blueprint for all this stuff, right? You have to sort of find the mix of things that are the right things. And that's why I put this talk together, honestly, was to say, "Look, here's the set of, here's a landscape of possibilities you ought to be thinking about." And you ought to be thinking about developing the set of questions you can ask your business to find out which combinations of these practices are gonna be best for your environment. And then hopefully it's very clear that like, when we say something's multi-tenant, that isn't equate to shared infrastructure here, where multitenancy describes our SaaS environment and then we can have silo and pooled resources inside of it. And then to me, this whole control plane like is the center of a whole lot of the agility and the innovation and the manageability of your SaaS experience. Be sure you're, you know, getting your business and getting your teams and everyone to focus on the importance of building a good robust control plane. And then of course, I think I said this across the whole talk, like stack, service. Like, there's like, probably this talk could be a six hour series or something if I were to like attach all the services and the stacks to that story. And then, silo and pool really, down to the microservice level, I think we kind of hit that one hard. And then I want you to always be thinking about tiering, performance, noisy neighbor, all of these attributes, even after the system is up and running and asking yourself, "Hey, is there a different deployment model or a different strategy we ought to be applying here? Could we recompose this architecture slightly differently and refine them?" And don't think of your system as static. Think of these as all knobs and dials and opportunities to refine your environment and make it better. And obviously, for me, this is like the fun of being an architect. Like, the reason I've been doing SaaS for as long as I've been in SaaS is just because it's different every day and I just happen to enjoy the fact that it's different every day, so hopefully you will as well. Just highlighting, we are Wednesday, so that I think there's still plenty of sessions left. Here's a list of breakout sessions that are going on that are SaaS-related. We also have great workshops going around serverless SaaS, microservices, EKS. If you- You can also self-service, go get those and run them on your own. Series of chalk talks and a builder session that's out there. And then most importantly, the QR codes I promised at the beginning. You'll see specifically, one of these takes you to our reference architecture. So I'm gonna leave that slide up particularly long just so everybody can get a picture. Hope this is valuable to you. Hope this'll go back and have impact, and make sure you fill out your survey. But thanks so much for joining the session. (applause)