AWS Cloud Financial Management
re:Invent 2024 Cost Optimization highlights that you were not expecting
With re:Invent 2024 in the books, and over 50 launch announcements, here are four that I am most excited about. The overarching theme of these launches appears to be leveraging Amazon’s automation capabilities to optimize costs and improve efficiency for customers.
Amazon Bedrock Intelligent Prompt Routing and Prompt Caching
Amazon Bedrock Intelligent Prompt Routing and prompt caching provides developers with tools to significantly reduce the operational costs of generative AI applications while maintaining or even improving performance and response quality. You can use these releases along side the new Amazon Nova foundation models deliver frontier intelligence and industry leading price-performance, with support for text and multimodal intelligence, multimodal fine-tuning, and high-quality images and videos. You can optimize the costs of your generative AI app by using a model from the Amazon Nova family.
Intelligent Prompt Routing
With Amazon Bedrock Intelligent Prompt Routing, simple queries are handled by smaller, faster, and more cost-effective models, while complex queries are routed to more capable models. Using advanced prompt matching and model understanding techniques, requests are intelligently routed between different models in the same family (e.g., Claude 3.5 Sonnet and Claude 3 Haiku) based on the complexity of the prompt. This feature can reduce costs by up to 30% without compromising accuracy. During the preview, there’s no additional cost for using a prompt router beyond the cost of the selected model.
Prompt Caching
With prompt caching, Amazon Bedrock will reduce redundant processing by caching frequently used context in prompts across multiple model invocations. Prompt caching can reduce costs by up to 90% and decrease latency by up to 85% for supported models. Follow getting started instructions here to request access to the preview.”
Intelligent-Tiering for file storage with Amazon FSx for OpenZFS
Amazon FSx for OpenZFS introduced Intelligent-Tiering storage class, is a brand new way for you to reduce costs for cloud-based Network-Attached Storage (NAS). With no upfront costs or commitments, this is ideal for workloads with fluctuating storage needs or large datasets. The new storage class brings S3-like pricing and elasticity to file storage, by combining elastic, intelligently-tiered, high performance storage with powerful NAS capabilities like quotas, snapshots, clones, data compression and replication.
FSx for OpenZFS Intelligent-Tiering automatically scales storage up and down, eliminating unutilized space and the need for capacity planning. This fully managed file system intelligently moves data between three storage tiers based on access patterns:
- Frequent Access: For data accessed within the last 30 days.
- Infrequent Access: For data not accessed for 30-90 days, at a 44% cost reduction from Frequent Access.
- Archive Instant Access: For data not accessed for 90+ days, at a 65% cost reduction from Infrequent Access.
All data remains instantly retrievable regardless of tier. The new storage class is priced 85% lower than the existing SSD storage class and 20% lower than traditional HDD-based on-premises deployments.
Amazon SageMaker introduces Scale Down to Zero
You can now scale down Amazon Sagemaker Inference to zero, allowing for a closer alignment of compute resource usage with actual needs and potentially reducing costs during times of low demand. Previously, SageMaker inference endpoints maintained a minimum number of instances even during low or no traffic periods, incurring unnecessary costs. With this feature, enabled endpoints will scale down to zero instances during periods of inactivity, completely eliminating compute costs when the endpoint is not in use. Scale down to zero is only supported when using inference components. For more information on inference components, see Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker.
Cost Savings for Various Scenarios
- Predictable Traffic Patterns: Automatically scales down to zero during known low-usage periods, eliminating the need for manual endpoint management.
- Sporadic or Variable Traffic: Offers significant cost savings for applications with inconsistent usage patterns.
- Development and Testing: Prevents unwanted charges for temporary endpoints used in model development and experimentation by automatically scaling to zero when not in use.
Organizations will be able to significantly reduce inference costs while maintaining the flexibility and performance required for your machine learning applications.
Amazon Elastic Compute Cloud Auto Scaling Target Tracking scaling policies
With Target Tracking policies, Amazon EC2 Auto Scaling (ASG) now automatically adapts to unique usage patterns of individual applications, optimizing the balance between cost and performance. On applications with volatile demand patterns, like client-serving APIs, live streaming services, or e-commerce websites, Target Tracking will reduce the time to detect and respond to changing demand. This new feature allows scaling based on high-resolution Amazon CloudWatch metrics, enabling faster detection and response to changing demand patterns.
Target Tracking reduces idle resources by responding more quickly to changes in demand while helping maintain higher utilization of Amazon EC2 instances. The self-tuning capability saves time and effort by automatically adjusting to application-specific patterns, potentially reducing over-provisioning. With these capabilities, the new highly responsive scaling policies enable organizations to fine-tune their Auto Scaling groups for better cost optimization while maintaining application performance and availability.
AWS cost announcements from re:Invent 2024?
There were ten Cloud Financial Management announcements this year. Read our last blog “2024 re:Invent announcement recap for AWS Cloud Financial Management services” and check out this video, 2024 re:Invent CFM reCap | The Keys to AWS Optimization | S11 E9 to learn all about them and see where they can be used in your organization.