Optimizing your AWS Infrastructure for Sustainability, Part II: Storage

In Part I of this series, we introduced you to strategies to optimize the compute layer of your AWS architecture for sustainability. We provided you with success criteria, metrics, and architectural patterns to help you improve resource and energy efficiency of your AWS workloads.

This blog post focuses on the storage layer of your AWS infrastructure and provides recommendations that you can use to store your data sustainably.

Optimizing the storage layer of your AWS infrastructure

Managing your data lifecycle and using different storage tiers are key components to optimizing storage for sustainability. When you consider different storage mechanisms, remember that you’re introducing a trade-off between resource efficiency, access latency, and reliability. This means you’ll need to select your management pattern accordingly.

Reducing idle resources and maximizing utilization

Storing and accessing data efficiently, in addition to reducing idle storage resources results in a more efficient and sustainable architecture. Amazon CloudWatch offers storage metrics that can be used to assess storage improvements, as listed in the following table.

Service	Metric	Source
Amazon Simple Storage Service (Amazon S3)	BucketSizeBytes	Metrics and dimensions
Amazon Simple Storage Service (Amazon S3)	S3 Object Access	Logging requests using server access logging
Amazon Elastic Block Store (Amazon EBS)	VolumeIdleTime	Amazon EBS metrics
Amazon Elastic File System (Amazon EFS)	StorageBytes	Amazon CloudWatch metrics for Amazon EFS
Amazon FSx for Lustre	FreeDataStorageCapacity	Monitoring Amazon FSx for Lustre
Amazon FSx for Windows File Server	FreeStorageCapacity	Monitoring with Amazon CloudWatch
Amazon FSx for NetApp ONTAP	StorageCapacity / StorageUsed	File system metrics
Amazon FSx for OpenZFS	StorageCapacity / UsedStorageCapacity	Monitoring with Amazon CloudWatch

You can monitor these metrics with the architecture shown in Figure 1. CloudWatch provides a unified view of your resource metrics.

Figure 1. CloudWatch for monitoring your storage resources

In the following sections, we present four concepts to reduce idle resources and maximize utilization for your AWS storage layer.

Analyze data access patterns and use storage tiers

Choosing the right storage tier after analyzing data access patterns gives you more sustainable storage options in the cloud.

By storing less volatile data on technologies designed for efficient long-term storage, you will optimize your storage footprint. More specifically, you’ll reduce the impact you have on the lifetime of storage resources by storing slow-changing or unchanging data on magnetic storage, as opposed to solid state memory. For archiving data or storing slow-changing data, consider using Amazon EFS Infrequent Access, Amazon EBS Cold HDD volumes, and Amazon S3 Glacier.
To store your data efficiently throughout its lifetime, create an Amazon S3 Lifecycle configuration that automatically transfers objects to a different storage class based on your pre-defined rules. The Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs blog post shows you how to create custom object expiry rules for Amazon S3 based on the last accessed date of the object.
For data with unknown or changing access patterns, use Amazon S3 Intelligent-Tiering to monitor access patterns and move objects among tiers automatically. In general, you have to make a trade-off between resource efficiency, access latency, and reliability when considering these storage mechanisms. Figure 2 shows an overview of data access patterns for Amazon S3 and the resulting storage tier. For example, in S3 One Zone-IA, energy and server capacity are reduced, because data is stored only within one Availability Zone.

Figure 2. Data access patterns for Amazon S3

Use columnar data formats and compression

Columnar data formats like Parquet and ORC require less storage capacity compared to row-based formats like CSV and JSON.

Parquet consumes up to six times less storage in Amazon S3 compared to text formats. This is because of features such as column-wise compression, different encodings, or compression based on data type, as shown in the Top 10 Performance Tuning Tips for Amazon Athena blog post.
You can improve performance and reduce query costs of Amazon Athena by 30–90 percent by compressing, partitioning, and converting your data into columnar formats. Using columnar data formats and compressions reduces the amount of data scanned.

Reduce unused storage resources

Right size or delete unused storage volumes

As shown in the Cost Optimization on AWS video, right-sizing storage by data type and usage reduces your associated costs by up to 50 percent.

A straightforward way to reduce unused storage resources is to delete unattached EBS volumes. If the volume needs to be quickly restored later on, you can store an Amazon EBS snapshot before deletion.
You can also use Amazon Data Lifecycle Manager to retain and delete EBS snapshots and Amazon EBS-backed Amazon Machine Images (AMIs) automatically. This further reduces the storage footprint of stale resources.
To avoid over-provisioning volumes, see the Automating Amazon EBS Volume-resizing blog post. It demonstrates an automated workflow that can expand a volume every time it reaches a capacity threshold. These Amazon EBS elastic volumes extend a volume when needed, as shown in the Amazon EBS Update blog post.
Another way to optimize block storage is to identify volumes that are underutilized and downsize them. Or you can change the volume type, as shown in the AWS Storage Optimization whitepaper.

Modify the retention period of CloudWatch Logs

By default, CloudWatch Logs are kept indefinitely and never expire. You can adjust the retention policy for each log group to be between one day and 10 years. For compliance reasons, export log data to Amazon S3 and use archival storage such as Amazon S3 Glacier.

Deduplicate data

Large datasets often have redundant data, which increases your storage footprint.

By turning on data deduplication for your Amazon FSx for Windows File Server, you will optimize data storage. For general-purpose file shares, storage space can be reduced by 50–60 percent through deduplication.
If you have datasets residing in Amazon S3, you can automatically get rid of duplicates by using the FindMatches transform provided by AWS Lake Formation. See the Integrate and deduplicate datasets using AWS Lake Formation FindMatches blog post for more information on how to set it up.

Conclusion

In this blog post, we discussed data storing techniques to increase your storage efficiency. These include right-sizing storage volumes; choosing storage tiers depending on different data access patterns; and compressing and converting data.

These techniques allow you to optimize your AWS infrastructure for environmental sustainability.

This blog post is the second post in the series, you can find the first part of the series linked in the following section. In the next part of this blog post series, we will show you how you can optimize the networking part of your IT infrastructure for sustainability in the cloud!

Related information

AWS Architecture Monthly magazine, Sustainability issue, August 2021

AWS Architecture Blog