Guidance for Ultra-Low Latency, Machine Learning Feature Stores on AWS

This Guidance shows how you can build an ultra-low latency online feature store using Amazon ElastiCache for Redis, a fully managed Redis service from AWS, and Feast, an open-source store framework. The online store uses machine learning (ML) for real-time data access and sub-millisecond latency. This Guidance covers a sample use case based on a real-time loan approval application that makes online predictions based on a customer’s credit scoring model.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Guidance Architecture Diagram for Ultra-Low Latency, Machine Learning (ML) Feature Stores on AWS

Step 1
Set up data infrastructure to deploy Amazon Redshift, an Amazon Simple Storage Service (Amazon S3) bucket containing zip code and credit history parquet files, and AWS Identity and Access Management (IAM) roles. Additionally, set up policies for Amazon Redshift to access Amazon S3, and create an Amazon Redshift table that can query the parquet files.

Step 2
Deploy Feast infrastructure.

Step 3
Create a feature store repository, and configure Amazon ElastiCache as the online feature store and Amazon Redshift as the offline feature store. Create feature definitions.

Step 4
Register the feature definitions and the underlying infrastructure into a Feast registry using the Feast SDK.

Step 5
Generate training data using features and labels from the data and features from Feast. The features from Feast enrich the historical data and create a Feature DataFrame.

Step 6
Train the ML model using the training dataset and a model trainer.

Step 7
Ingest batch features into the ElastiCache online feature store. These online features are used to make online predictions with our trained model.

Step 8
Read feature vector from ElastiCache for making predictions.

Step 9
Use AWS Key Management Service (AWS KMS) to encrypt ElastiCache data at rest.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

Amazon CloudWatch enhances operational excellence for ElastiCache by providing comprehensive monitoring, logging, and automation capabilities. It tracks ElastiCache metrics like CPU utilization, memory usage, network traffic, command statistics, and cache hit/miss ratios, enabling proactive performance management. CloudWatch logs integration allows centralized log analysis, simplifying troubleshooting. CloudWatch alarms invoke automated actions, such as scaling ElastiCache clusters for optimal performance during traffic spikes while reducing costs during lulls.

Read the Operational Excellence whitepaper
Security

By scoping IAM policies to the minimum required permissions, unauthorized access to resources is limited. KMS provides control over encryption keys used to protect data, eliminating key management overhead. IAM policies are scoped to grant ElastiCache only the necessary permissions for operation. ElastiCache offers encryption in-transit and at-rest, while KMS allows you to create, manage, and control access to customer-managed encryption keys used for data protection.

Read the Security whitepaper
Reliability

ElastiCache auto scaling groups ensure reliable performance by dynamically adjusting Redis cluster capacity (shards and replicas) based on utilization metrics. CloudWatch continuously monitors key metrics and initiates alarms to proactively detect and mitigate issues. Auto scaling handles traffic spikes by launching additional nodes, preventing overload and maintaining consistent performance. Nodes can be distributed across Availability Zones, enhancing redundancy against outages.

Read the Reliability whitepaper
Performance Efficiency

ElastiCache auto scaling dynamically provisions and right-sizes Redis clusters based on demand for optimal resource utilization. During traffic spikes, auto scaling launches additional nodes to handle increased loads, preventing overloads and maintaining low latency.

ElastiCache features such as in-memory architecture, data structures, transactions, scripting, and clustering are optimized for high throughput and low latency operations, making it ideal for performance-critical workloads. Horizontal scaling and read replicas further boost throughput and response times.

Redis Cluster Mode shards data across multiple nodes, distributing memory and workload for improved parallelization and linear throughput scaling. Sharding maximizes memory utilization by overcoming single-node limits, while locally executing commands on shards minimizes network hops.

Read the Performance Efficiency whitepaper
Cost Optimization

Auto scaling optimizes ElastiCache costs by automatically adjusting cluster capacity based on utilization metrics. During low traffic periods, it scales in by terminating unnecessary nodes, preventing overprovisioning and reducing operational expenses. Conversely, it launches additional nodes during traffic spikes, helping to ensure sufficient capacity without incurring excess costs. This elasticity eliminates the need for manual capacity management and helps ensure clusters are right-sized to workload demands, running only the required resources.

Read the Cost Optimization whitepaper
Sustainability

ElastiCache allows right-sizing caches to match application requirements, improving infrastructure efficiency and preventing resource waste.

The availability of multiple AWS Regions enables deploying ElastiCache clusters closer to end users, reducing network latency and data transfer and leading to lower energy consumption and emissions from reduced network usage.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Build an ultra-low latency online feature store for real-time inferencing using Amazon ElastiCache for Redis

Disclaimer

Was this page helpful?

Guidance for Ultra-Low Latency, Machine Learning Feature Stores on AWS

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Build an ultra-low latency online feature store for real-time inferencing using Amazon ElastiCache for Redis

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer