Address Modernization Tradeoffs with Lake House Architecture
Many organizations are modernizing their applications to reduce costs and become more efficient. They must adapt to modern application requirements that provide 24×7 global access. The ability to scale up or down quickly to meet demand and process a large volume of data is critical. This is challenging while maintaining strict performance and availability. For many companies, modernization includes decomposing a monolith application into a set of independently developed, deployed, and managed microservices. The decoupled nature of a microservices environment allows each service to evolve agilely and independently. While there are many benefits for moving to a microservices-based architecture, there can be some tradeoffs. As your application monolith evolves into independent microservices, you must consider the implications to your data architecture.
In this blog post we will provide example use cases, and show how Lake House Architecture on AWS can streamline your microservices architecture. A Lake house architecture embraces the decentralized nature of microservices by facilitating data movement. These transfers can be between data stores, from data stores to data lake, and from data lake to data stores (Figure 1).
Health and wellness application challenges
Our fictitious health and wellness customer has an application architecture comprised of several microservices backed by purpose-built data stores. User profiles, assessments, surveys, fitness plans, health preferences, and insurance claims are maintained in an Amazon Aurora MySQL-Compatible relational database. The event service monitors the number of steps walked, sleep pattern, pulse rate, and other behavioral data in Amazon DynamoDB, a NoSQL database (Figure 2).
With this microservices architecture, it’s common to have data spread across various data stores. This is because each microservice uses a purpose-built data store suited to its usage patterns and performance requirements. While this provides agility, it also presents challenges to deriving needed insights.
Here are four challenges that different users might face:
- As a health practitioner, how do I efficiently combine the data from multiple data stores to give personalized recommendations that improve patient outcomes?
- As a sales and marketing professional, how do I get a 360 view of my customer, when data lives in multiple data stores? Profile and fitness data are in a relational data store, but important behavioral and clickstream data are in NoSQL data stores. It’s hard for me to run targeted marketing campaigns, which can lead to revenue loss.
- As a product owner, how do I optimize healthcare costs when designing wellbeing programs for patients?
- As a health coach, how do I find patients and help them with their wellness goals?
Our remaining subsections highlight AWS Lake House Architecture capabilities and features that allow data movement and the integration of purpose-built data stores.
1. Patient care use case
In this scenario, a health practitioner is interested in historical patient data to estimate the likelihood of a future outcome. To get the necessary insights and identify patterns, the health practitioner needs event data from Amazon DynamoDB and patient profile data from Aurora MySQL-Compatible. Our health practitioner will use Amazon Athena to run an ad-hoc analysis across these data stores.
Amazon Athena provides an interactive query service for both structured and unstructured data. The federated query functionality in Amazon Athena helps with running SQL queries across data stored in relational, NoSQL, and custom data sources. Amazon Athena uses Lambda-based data source connectors to run federated queries. Figure 3 illustrates the federated query architecture.
The patient care team could use an Amazon Athena federated query to find out if a patient needs urgent care. It is able to detect anomalies in the combined datasets from claims processing, device data, and electronic health record (HER) as show in Figure 4.
Healthcare data from various sources, including EHRs and genetic data, helps improve personalized care. Machine learning (ML) is able to harness big data and perform predictive analytics. This creates opportunities for researchers to develop personalized treatments for various diseases, including cancer and depression.
To achieve this, you must move all the related data into a centralized repository such as an Amazon S3 data lake. For specific use cases, you also must move the data between the purpose-built data stores. Finally, you must build an ML solution that can predict the outcome. Amazon Redshift ML, combined with its federated query processing capabilities enables data analysts and database developers to create a platform to detect patterns (Figure 5). With this platform, health practitioners are able to make more accurate, data-driven decisions.
2. Sales and marketing use case
To run marketing campaigns, the sales and marketing team must search customer data from a relational database, with event data in a non-relational data store. We will move the data from Aurora MySQL-Compatible and Amazon DynamoDB to Amazon Elasticsearch Service (ES) to meet this requirement.
AWS Database Migration Service (DMS) helps move the change data from Aurora MySQL-Compatible to Amazon ES using Change Data Capture (CDC). AWS Lambda could be used to move the change data from DynamoDB streams to Amazon ES, as shown in Figure 6.
The sales and marketing team can now run targeted marketing campaigns by querying data from Amazon Elasticsearch Service, see Figure 7. They can improve sales operations by visualizing data with Amazon QuickSight.
3. Healthcare product owner use case
In this scenario, the product owner must define the care delivery value chain. They must develop process maps for patient activity and estimate the cost of patient care. They must analyze these datasets by building business intelligence (BI) reporting and dashboards with a tool like Amazon QuickSight. Amazon Redshift, a cloud scale data warehousing platform, is a good choice for this. Figure 8 illustrates this pattern.
The product owners can use integrated business intelligence reports with Amazon Redshift to analyze their data. This way they can make more accurate and appropriate decisions, see Figure 9.
4. Health coach use case
In this scenario, the health coach must find a patient based on certain criteria. They would then send personalized communication to connect with the patient to ensure they are following the proposed health plan. This proactive approach contributes to a positive patient outcome. It can also reduce healthcare costs incurred by insurance companies.
To be able to search patient records with multiple data points, it is important to move data from Amazon DynamoDB to Amazon ES. This also will provide a fast and personalized search experience. The health coaches can be notified and will have the information they need to provide guidance to their patients. Figure 10 illustrates this pattern.
The health coaches can use Elasticsearch to search users based on specific criteria. This helps them with counseling and other health plans, as shown in Figure 11.
In this post, we highlight how Lake House Architecture on AWS helps with the challenges and tradeoffs of modernization. A Lake House architecture on AWS can help streamline the movement of data between the microservices data stores. This offers new capabilities for various analytics use cases.
For further reading on architectural patterns, and walkthroughs for building Lake House Architecture, see the following resources:
- Build a Simplified ETL and Live Data Query Solution using Redshift Federated Query
- Creating a source to Lakehouse data replication pipe using Apache Hudi, AWS Glue, AWS DMS, and Amazon Redshift
- Effective data lakes using AWS Lake Formation, Part 1: Getting started with governed tables (multipart blog)