AWS Storage Blog

Category: Analytics

Amazon S3 featured image - new

Consolidate and query Amazon S3 Inventory reports for Region-wide object-level visibility

Organizations around the world store billions of objects and files representing terabytes to petabytes of data. Data is often owned by different teams, departments, or business units, spanning multiple locations. As the amount of datastores, locations, and owners grow, you need a way to cost-effectively maintain visibility on important characteristics of your data, including based […]

Amazon S3 Archive Storage Classes

Identify cold objects for archiving to Amazon S3 Glacier storage classes

Update (02/13/2024): Consider Amazon S3 Lifecycle transition fees that are charged based on the total number of objects being transitioned, the destination storage class (listed on the Amazon S3 pricing page), as well as the additional metadata charges applied. You can use the S3 pricing calculator to estimate the total upfront and monthly costs by […]

AWS DataSync Featured Image 2020

Migrate on-premises data to AWS for insightful visualizations

When migrating data from on premises, customers seek a data store that is scalable, durable, and cost effective. Equally as important, BI must support modern, interactive, and fast dashboards that can scale to tens of thousands of users seamlessly while providing the ability to create meaningful data visualizations for analysis. Visualization of on-premises business analytics […]

S3 Security

Disabling ACLs for existing Amazon S3 workloads with information in S3 server access logs and AWS CloudTrail

Access control lists (ACLs) are permission sets that define user access, and the operations users can take on specific resources. Amazon S3 was launched in 2006 with ACLs as its first authorization mechanism. Since 2011, Amazon S3 has also supported AWS Identity and Access Management (IAM) policies for managing access to S3 buckets, and recommends using […]

Maximizing price performance for big data workloads using Amazon EBS

Since the emergence of big data over a decade ago, Hadoop ­– an open-source framework that is used to efficiently store and process large datasets – has been crucial in storing, analyzing, and reducing that data to provide value for enterprises. Hadoop lets you store structured, partially structured, or unstructured data of any kind across […]

Simplify and scale access management to shared datasets with cross-account Amazon S3 Access Points

In today’s interconnected and data centric world, businesses must have access to the right data for data-driven decision-making, ultimately driving better business results. Collecting all the relevant data takes time and capital as it requires setting up data ingestion pipelines, hiring analysts to validate and interpret the data, and incorporating data insights that influence important […]

Isima.io optimizes price performance for OLAP workloads using Amazon EBS

Isima.io, a unified analytics startup founded in 2016, aims to accelerate analytics outcomes for organizations. Isimia.io does this by combining multiple data management disciplines – including Enterprise Service Bus (ESB), Extract-Transform-Load (ETL), Enterprise-Data-Warehouse (EDW), and Business Intelligence (BI) – into one hyper-converged system. IT teams can only win by building differentiated, agile data apps. The […]

Simplify archiving Amazon EBS Snapshots and monitor progress using a live Amazon CloudWatch dashboard

Data protection is top of mind for our customers, and having a data backup strategy is critical to ensure compliance, disaster recovery readiness, and business continuity. As customers experience exponential business growth, their data storage needs grow as well, and data retention can become very costly. In order to meet compliance requirements for data retention […]

Amazon S3 featured image - new

Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR

UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to query your data in Amazon S3. Learn more » Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, […]

AWS DataSync Featured Image 2020

How TMAP Mobility transferred 2.4 PB of Hadoop data using AWS DataSync

Launched in 2002, TMAP Mobility is Korea’s leading mobility platform, with 20 million registered users and 14 million monthly active users. TMAP provides navigation services based on a wide range of real-time traffic information and data. Previously, the Data Intelligence group at TMAP Mobility operated a mobility-data platform based on a Hadoop Distributed File System […]