AWS Storage Blog
Category: Analytics
Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR
UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to optimize querying your data in Amazon S3. Learn more » Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For […]
How TMAP Mobility transferred 2.4 PB of Hadoop data using AWS DataSync
Launched in 2002, TMAP Mobility is Korea’s leading mobility platform, with 20 million registered users and 14 million monthly active users. TMAP provides navigation services based on a wide range of real-time traffic information and data. Previously, the Data Intelligence group at TMAP Mobility operated a mobility-data platform based on a Hadoop Distributed File System […]
Using presigned URLs to identify per-requester usage of Amazon S3
Many software-as-a-service (SaaS) product offerings have a pay-as-you-go pricing model, charging customers only for the resources consumed. However, a pay-as-you-go pricing is only viable when you can accurately track each customer’s use of resources, such as compute capacity, storage, and networking bandwidth. Without this data, SaaS providers do not have visibility into resource consumption of […]
Restore data from Amazon S3 Glacier storage classes starting with partial object keys
When managing data storage, it is important to optimize for cost by storing data in the most cost-effective manner based on how often data is used or accessed. For many enterprises, this means using some form of cold storage or archiving for data that is less frequently accessed or used while keeping more frequently used […]
Optimize storage costs by analyzing API operations on Amazon S3
The demand for data storage has increased with the advent of a fast-paced data environment – creating, sharing, and replicating data at a large scale. Most organizations are looking for the optimal way to store their data cost-effectively, giving them everything they need from their data but without breaking the bank. Cloud storage provides flexible […]
How Simon Data reduced encryption costs by using Amazon S3 Bucket Keys on existing objects
As more organizations look to operate faster and at scale, they need ways to meet critical compliance requirements and improve data security. Encryption is a critical component of a defense in depth strategy, and when used correctly, can provide an additional layer of protection above basic access control. However, workloads that access millions or billions […]
Collecting, archiving, and retrieving surveillance footage with AWS
Video feeds and still images from judiciary locations are considered critical forms of evidence in the court of law. These locations can be police stations and government offices or even civil locations of importance like banks and hospitals. As governments, particularly in smart cities rely upon video surveillance, it is critical to design a cost […]
Point-in-time restore for Amazon S3 buckets
Enterprises store increasing quantities of object data for use cases like data lakes, document management systems, and media libraries. Performing point-in-time restores for large datasets can be challenging, as existing approaches with full-restore from backup are time consuming and expensive. Alternatively, restoring individual objects to previous versions is prone to errors and delays the restore […]
Monitoring and reporting Amazon FSx user access events using Splunk
UPDATE 9/8/2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Monitoring end-user activity and access to data is core to any modern data security strategy. As customers migrate workloads to the cloud, logging end-user accesses of customer data is a key component of internal security policies and is required to meet […]
Manage and analyze your data at scale using Amazon S3 Inventory and Amazon Athena
Object storage gives you virtually unlimited scale, which helps you grow your business without being concerned with managing the infrastructure to support your data. Managing millions to billions of objects in Amazon S3 can be difficult, inefficient, and time consuming if you don’t take steps to automate the management of this data at scale. Data […]