Operating Lambda: Understanding event-driven architecture – Part 1
In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses event-driven architectures and how these relate to serverless applications.
Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale and extensibility, while also reducing complexity and the overall amount of code in an application.
Event-driven architectures have grown in popularity because they help address some of the inherent challenges in building the complex systems commonly used in modern organizations. This approach promotes the use of microservices, which are small, specialized services performing a narrow set of functions. A well-designed, Lambda-based application is compatible with the principles of microservice architectures.
How Lambda fits into the event-driven paradigm
Lambda is an on-demand compute service that runs custom code in response to events. Most AWS services generate events, and many can act as an event source for Lambda. Within Lambda, your code is stored in a code deployment package and contains an event handler. All interaction with the code occurs through the Lambda API and there is no direct invocation of functions from outside of the service. The main purpose of Lambda functions is to process events.
Unlike traditional servers, Lambda functions do not run constantly. When a function is triggered by an event, this is called an invocation. Lambda functions are purposefully limited to 15 minutes in duration but on average, across all AWS customers, most invocations only last for less than a second. In some intensive compute operations, it may take several minutes to process a single event but in the majority of cases the duration is brief.
An event triggering a Lambda function could be almost anything, from an HTTP request via Amazon API Gateway, a schedule managed by an Amazon EventBridge rule, or an Amazon S3 notification. Even the simplest Lambda-based application uses at least one event.
The event itself is a JSON object that contains information about what happened. Events are facts about a change in the system state, they are immutable, and the time when they happen is significant. The first parameter of every Lambda handler contains the event. An event could be custom-generated from another microservice, such as new order generated in an ecommerce application:
The event may also be generated by an AWS service, such as Amazon SQS when a new message is available in a queue:
In either case, the event is passed to the Lambda function as the first parameter in the Lambda handler:
- The code outside of the handler, also known as “INIT” code, is run before the handler. This is used for tasks like importing libraries or declaring and initializing global objects.
- The handler itself is a function that takes the event object. Regardless of runtime used in the Lambda function, the event is a JSON object.
For smaller applications, the difference between event-driven and request-driven applications may not be clear. As your applications develop more functionality and handle more traffic, this becomes more apparent. Request-driven applications typically use directed commands to coordinate downstream functions to complete an activity and are often tightly coupled. Event-driven applications create events that are observable by other services and systems, but the event producer is unaware of which consumers, if any, are listening. Typically, these are loosely coupled.
Most Lambda-based applications use a combination of AWS services for durably storing data and integrating with other system and services. In these applications, Lambda acts as glue between the services, providing business logic to transform data as it moves between services.
Building Lambda-based applications follows many of the best practices of building any event-based architecture. A number of development approaches have emerged to help developers create event-driven systems. Event storming, which is an interactive approach to domain-driven design (DDD), is one popular methodology. As you explore the events in your workload, you can group these as bounded contexts to develop the boundaries of the microservices in your application.
The benefits of event-driven architectures
Replacing polling and webhooks with events
Many traditional architectures frequently use polling and webhook mechanisms to communicate state between different components. Polling can be highly inefficient for fetching updates since there is a lag between new data becoming available and synchronization with downstream services. Webhooks are not always supported by other microservices that you want to integrate with. They may also require custom authorization and authentication configurations. In both cases, these integration methods are challenging to scale on-demand without additional work by development teams.
Both of these mechanisms can be replaced by events, which can be filtered, routed, and pushed downstream to consuming microservices. This approach can result in less bandwidth consumption, CPU utilization, and potentially lower cost. These architectures can reduce complexity, since each functional unit is smaller and there is often less code.
Event-driven architectures can also make it easier to design near-real-time systems, helping organizations move away from batch-based processing. Events are generated at the time when state in the application changes, so the custom code of a microservice should be designed to handle the processing of a single event. Since scaling is handled by the Lambda service, this architecture can handle significant increases in traffic without changing custom code. As events scale up, so does the compute layer that processes events.
Microservices enable developers and architects to decompose complex workflows. For example, an ecommerce monolith may be broken down into order acceptance and payment processes with separate inventory, fulfillment and accounting services. What may be complex to manage and orchestrate in a monolith becomes a series of decoupled services that communicate asynchronously with event messages.
This approach also makes it possible to assemble services that process data at different rates. In this case, an order acceptance microservice can store high volumes of incoming orders by buffering the messages in an Amazon SQS queue.
A payment processing service, which is typically slower due to the complexity of handling payments, can take a steady stream of messages from the SQS queue. It can orchestrate complex retry and error handling logic using AWS Step Functions, and coordinate active payment workflows for hundreds of thousands of orders.
Improving scalability and extensibility
Microservices generate events that are typically published to messaging services like Amazon SNS and SQS. These behave like an elastic buffer between microservices and help handle scaling when traffic increases. Services like EventBridge can then filter and route messages depending upon the content of the event, as defined in rules. As a result, event-based applications can be more scalable and offer greater redundancy than monolithic applications.
This system is also highly extensible, allowing other teams to extend features and add functionality without impacting the order processing and payment processing microservices. By publishing events using EventBridge, this application integrates with existing systems, such as the inventory microservice, but also enables any future application to integrate as an event consumer. Producers of events have no knowledge of event consumers, which can help simplify the microservice logic.
Trade-offs of event-driven architectures
Unlike monolithic applications, which may process everything within the same memory space on a single device, event-driven applications communicate across networks. This design introduces variable latency. While it’s possible to engineer applications to minimize latency, monolithic applications can almost always be optimized for lower latency at the expense of scalability and availability.
The serverless services in AWS are highly available, meaning that they operate in more than one Availability Zone in a Region. In the event of a service disruption, services automatically fail over to alternative Availability Zones and retry transactions. As a result, instead of a transaction failing, it may be completed successfully but with higher latency.
Workloads that require consistent low-latency performance, such as high-frequency trading applications in banks or submillisecond robotics automation in warehouses, are not good candidates for event-driven architecture.
An event represents a change in state. With many events flowing through different services in an architecture at any given point of time, such workloads are often eventually consistent. This makes it more complex to process transactions, handle duplicates, or determine the exact overall state of a system.
Some workloads are not well suited for event-driven architecture, due to the need for ACID properties. However, many workloads contain a combination of requirements that are eventually consistent (for example, total orders in the current hour) or strongly consistent (for example, current inventory). For those features needing strong data consistency, there are architecture patterns to support this.
Event-based architectures are designed around individual events instead of large batches of data. Generally, workflows are designed to manage the steps of an individual event or execution flow instead of operating on multiple events simultaneously. Real-time event processing is preferred to batch processing in event-driven systems, replacing a batch with many small incremental updates. While this can make workloads more available and scalable, it also makes it more challenging for events to have awareness of other events.
Returning values to callers
In many cases, event-based applications are asynchronous. This means that caller services do not wait for requests from other services before continuing with other work. This is a fundamental characteristic of event-driven architectures that enables scalability and flexibility. This means that passing return values or the result of a workflow is often more complex than in synchronous execution flows.
Most Lambda invocations in productions systems are asynchronous, responding to events from services like S3 or SQS. In these cases, the success or failure of processing an event is often more important than returning a value. Features such as dead letter queues (DLQs) in Lambda are provided to ensure you can identify and retry failed events, without needing to notify the caller.
For interactive workloads, such as web and mobile applications, the end user usually expects to receive a return value or a current status of a transaction. For these workloads, there are several design patterns that can provide rich eventing back to the caller. However, these implementations are more complex than using a traditional asynchronous return value.
Debugging across services and functions
Debugging event-driven systems is also different to solving problems with a monolithic application. With different systems and services passing events, it is often not possible to record and reproduce the exact state of multiple services when an error occurs. Since each service and function invocation has separate log files, it can be more complicated to determine what happened to a specific event that caused an error.
Event-driven architectures have grown in popularity in modern organizations. This approach promotes the use of microservices, which can be designed as Lambda-based applications. This post discusses the benefits of the event-driven approach, along with the trade-offs involved.
Part 2 of this series discusses design principles and the best practices for developing Lambda-based applications.