AWS Architecture Blog
Optimizing your AWS Infrastructure for Sustainability, Part II: Storage
In Part I of this series, we introduced you to strategies to optimize the compute layer of your AWS architecture for sustainability. We provided you with success criteria, metrics, and architectural patterns to help you improve resource and energy efficiency of your AWS workloads.
This blog post focuses on the storage layer of your AWS infrastructure and provides recommendations that you can use to store your data sustainably.
Optimizing the storage layer of your AWS infrastructure
Managing your data lifecycle and using different storage tiers are key components to optimizing storage for sustainability. When you consider different storage mechanisms, remember that you’re introducing a trade-off between resource efficiency, access latency, and reliability. This means you’ll need to select your management pattern accordingly.
Reducing idle resources and maximizing utilization
Storing and accessing data efficiently, in addition to reducing idle storage resources results in a more efficient and sustainable architecture. Amazon CloudWatch offers storage metrics that can be used to assess storage improvements, as listed in the following table.
Service | Metric | Source |
Amazon Simple Storage Service (Amazon S3) | BucketSizeBytes | Metrics and dimensions |
S3 Object Access | Logging requests using server access logging | |
Amazon Elastic Block Store (Amazon EBS) | VolumeIdleTime | Amazon EBS metrics |
Amazon Elastic File System (Amazon EFS) | StorageBytes | Amazon CloudWatch metrics for Amazon EFS |
Amazon FSx for Lustre | FreeDataStorageCapacity | Monitoring Amazon FSx for Lustre |
Amazon FSx for Windows File Server | FreeStorageCapacity | Monitoring with Amazon CloudWatch |
Amazon FSx for NetApp ONTAP | StorageCapacity / StorageUsed | File system metrics |
Amazon FSx for OpenZFS | StorageCapacity / UsedStorageCapacity | Monitoring with Amazon CloudWatch |
You can monitor these metrics with the architecture shown in Figure 1. CloudWatch provides a unified view of your resource metrics.
In the following sections, we present four concepts to reduce idle resources and maximize utilization for your AWS storage layer.
Analyze data access patterns and use storage tiers
Choosing the right storage tier after analyzing data access patterns gives you more sustainable storage options in the cloud.
- By storing less volatile data on technologies designed for efficient long-term storage, you will optimize your storage footprint. More specifically, you’ll reduce the impact you have on the lifetime of storage resources by storing slow-changing or unchanging data on magnetic storage, as opposed to solid state memory. For archiving data or storing slow-changing data, consider using Amazon EFS Infrequent Access, Amazon EBS Cold HDD volumes, and Amazon S3 Glacier.
- To store your data efficiently throughout its lifetime, create an Amazon S3 Lifecycle configuration that automatically transfers objects to a different storage class based on your pre-defined rules. The Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs blog post shows you how to create custom object expiry rules for Amazon S3 based on the last accessed date of the object.
- For data with unknown or changing access patterns, use Amazon S3 Intelligent-Tiering to monitor access patterns and move objects among tiers automatically. In general, you have to make a trade-off between resource efficiency, access latency, and reliability when considering these storage mechanisms. Figure 2 shows an overview of data access patterns for Amazon S3 and the resulting storage tier. For example, in S3 One Zone-IA, energy and server capacity are reduced, because data is stored only within one Availability Zone.
Use columnar data formats and compression
Columnar data formats like Parquet and ORC require less storage capacity compared to row-based formats like CSV and JSON.
- Parquet consumes up to six times less storage in Amazon S3 compared to text formats. This is because of features such as column-wise compression, different encodings, or compression based on data type, as shown in the Top 10 Performance Tuning Tips for Amazon Athena blog post.
- You can improve performance and reduce query costs of Amazon Athena by 30–90 percent by compressing, partitioning, and converting your data into columnar formats. Using columnar data formats and compressions reduces the amount of data scanned.
Reduce unused storage resources
Right size or delete unused storage volumes
As shown in the Cost Optimization on AWS video, right-sizing storage by data type and usage reduces your associated costs by up to 50 percent.
- A straightforward way to reduce unused storage resources is to delete unattached EBS volumes. If the volume needs to be quickly restored later on, you can store an Amazon EBS snapshot before deletion.
- You can also use Amazon Data Lifecycle Manager to retain and delete EBS snapshots and Amazon EBS-backed Amazon Machine Images (AMIs) automatically. This further reduces the storage footprint of stale resources.
- To avoid over-provisioning volumes, see the Automating Amazon EBS Volume-resizing blog post. It demonstrates an automated workflow that can expand a volume every time it reaches a capacity threshold. These Amazon EBS elastic volumes extend a volume when needed, as shown in the Amazon EBS Update blog post.
- Another way to optimize block storage is to identify volumes that are underutilized and downsize them. Or you can change the volume type, as shown in the AWS Storage Optimization whitepaper.
Modify the retention period of CloudWatch Logs
By default, CloudWatch Logs are kept indefinitely and never expire. You can adjust the retention policy for each log group to be between one day and 10 years. For compliance reasons, export log data to Amazon S3 and use archival storage such as Amazon S3 Glacier.
Deduplicate data
Large datasets often have redundant data, which increases your storage footprint.
- By turning on data deduplication for your Amazon FSx for Windows File Server, you will optimize data storage. For general-purpose file shares, storage space can be reduced by 50–60 percent through deduplication.
- If you have datasets residing in Amazon S3, you can automatically get rid of duplicates by using the FindMatches transform provided by AWS Lake Formation. See the Integrate and deduplicate datasets using AWS Lake Formation FindMatches blog post for more information on how to set it up.
Conclusion
In this blog post, we discussed data storing techniques to increase your storage efficiency. These include right-sizing storage volumes; choosing storage tiers depending on different data access patterns; and compressing and converting data.
These techniques allow you to optimize your AWS infrastructure for environmental sustainability.
This blog post is the second post in the series, you can find the first part of the series linked in the following section. In the next part of this blog post series, we will show you how you can optimize the networking part of your IT infrastructure for sustainability in the cloud!
Other blog posts in this series
- Optimizing your AWS Infrastructure for Sustainability, Part I: Compute
- Optimizing your AWS Infrastructure for Sustainability, Part III: Networking