AWS Architecture Blog
Building Multi-partner integration on AWS using Event-Driven Architecture
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
Summary
Finserv MARKETS enables customers to buy financial services products such as credit cards, loans, insurance, and investments from various partners. Finserv MARKETS integrates with a large number of partners in real time to provide services to customers.
Each partner has their own semantic APIs which can pose a challenge. There are also issues of latency and failures. We’ll show how our solution based on Event Driven Architecture (EDA) on AWS with Reactive design offers a solution to address these challenges.
Challenges with multi-partner integration
- Latency – Making API calls to multiple partners increases latency of a service even in the scenario in which the calls are made in parallel.
- Timeouts – If a partner’s APIs time out, this could impact overall service performance and availability.
- Failures – A partner API failure could lead to the failure or major performance degradation of the overall service.
- Customer Experience – Preserving the customer experience when depending on partner integration is a challenge, considering these technical issues.
Conceptual solution
In order to build this multi-partner platform, we discussed using traditional command-driven synchronous architecture. We also considered adopting an Event-Driven Architecture (EDA). Building our solution on EDA enabled us to deliver consistent customer experience while addressing failures and performance issues from partner APIs.
The diagram following shows a conceptual view of the solution:
The key components of the solution are:
1. User Interface: It is the single page application running on the user’s browser. For example, the UI makes an API call to business services to calculate insurance premiums with required parameters and receives a unique identifier. This enables the UI to be reactive and display the responses from the partner as they arrive.
2. Business Service: A microservice which provides APIs for:
• The user interface to submit and generate events for request for partner offer
• A callback to submit the response from the partner integration service in a reactive manner
3. Event Bus: The event bus infrastructure enables transportation, routing, and delivery of events to the right services. The business service raises a set of events to the event bus for a quote request. There is one event for each partner that is listened to by the reactive services.
4. Reactive Services: Services that consume events and call partner integration services for the calling partner API. On receiving the partner response, it calls the callback API on the business service. These services are organized by product domain (for example Motor Insurance).
5. Partner Integration Service: This service is responsible for integration with partners. The service translates the canonical request to partner-specific API calls. The service implements partner-specific security and error handling. There is one partner integration service per partner.
Realizing this solution on AWS
Amazon Web Services (AWS) offers cloud-native services enabling us to realize this solution.
We used the following services to implement this architecture:
User Interface
We built the load balancing interface in Angular and host it on Apache Web Servers running Amazon Elastic Container Service (Amazon ECS). Elastic Load Balancing (ELB) distributes traffic evenly across the containers. Running a container gives us the required flexibility and scalability needed.
Business Service
The business service is a microservice built using Spring Boot and running on Amazon ECS containers. It uses an ELB used for service load balancing. The choice of Spring Boot and Amazon ECS gives flexibility and scalability through cluster auto scaling.
Data Store
We use polyglot architecture for data storage. For different product journeys and depending on data lifecycle, we choose either Amazon Aurora Postgres or Amazon ElastiCache (Redis OSS). This gives us the right mix of performance and required durability for each business use case.
Event Bus
We evaluated Amazon Kinesis and Amazon Simple Notification Service (Amazon SNS) and came to the conclusion that for our volume and use cases, Amazon SNS offers the right capability. We implemented this by defining topics, for example, a four-wheeler insurance quote. This topic is subscribed by reactive services for each partner integration, where the partner service is called and a quote is generated. For downstream functionality where the API is to be called of the selected partner (for example, insurance policy issuance or credit card application submission), we chose Amazon Simple Queue Service (SQS). Amazon SQS provides simple queuing for asynchronous processing.
Reactive Services
These services are built using two different technologies. Spring Boot microservice running in an Amazon ECS container and AWS Lambda functions. Spring Boot is chosen due to our team’s familiarity of this technology; however, our plan is to move completely towards usage of Lambda functions for all asynchronous reactive services.
Partner Integration Service
Partner integration services provides the abstraction layer between partner API and canonical API, and is called by reactive services synchronously. In some cases, the error is passed back to reactive service to decide on retry. In other cases, for example, a policy issuance API retry is built into this service using exponential backoff strategy.
Partner API tracking
Partner API tracking gives us the right way of tracking the partner request and proactively address failures. We use Amazon Elasticsearch Service and Kibana for tracking. We can implement a circuit breaker pattern to shut down any partner for a period of time should failures reach a given threshold.
How this solution addresses multi-partner integration
This solution is built on the foundation of Event Driven Architecture with reactive design and is able to address the following challenges:
Latency
Since the user interface asynchronously polls for the response, it receives the partner response as soon as it arrives. This makes the response latency on the fastest partner API rather than slowest.
Timeout
Timeouts are set in the UI for polling, so if a partner API doesn’t respond, we time it out without any degradation. These timeouts are set based on the established user experience benchmarks. For example, how long do we let a user wait to see all insurance quotes before degrading their experience?
Failures
In case of an API failure, the UI will time out. Our API tracking can also enable a circuit breaker pattern to take a partner offline in the event of persistent failures.
Customer Experience
The solution gives a consistent experience to the user. Users can make their choice from the partner quotes/offers in a reactive way as received, rather than waiting for all offers to be shown. This design meets our customer requirements, derived from our Voice of Customer (VoC) study sessions.
Conclusion
In the digital space, APIs are the most common mechanism for system integration. Building a solution that is scalable, resilient, and provides the best customer experience is challenging. Event Driven Architecture with Reactive design offers a solution to address these issues. We process over 5000+ requests every day in Insurance and Credit domains. We’ve been able to achieve required availability of over 99.9%, while maintaining a positive customer experience on this platform.