AWS Storage Blog
Category: AWS Glue
How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3
In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]
How Delhivery migrated 500 TB of data across AWS Regions using Amazon S3 Replication
Delhivery is one of the largest third-party logistics providers in India. It fulfills millions of packages every day, servicing over 18,000 pin codes in India and powered by more than 20 automated sort centers, 90 warehouses, with over 2800 delivery centers. Data is at the core of the Delhivery’s business. In anticipating of potential regulatory […]
Derive insights from AWS DataSync task reports using AWS Glue, Amazon Athena, and Amazon QuickSight
Update (10/30/2024): On October 30, 2024, AWS DataSync launched Enhanced mode tasks, prompting updates to this blog. Updates include a new step in the “Step 2: Populate Glue catalog with task reports data using a Glue crawler” section and detailed information on the new capabilities in “Updated steps for working with task reports of new […]
Use generative AI to query your Amazon S3 data lake for insights
Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]
Siemens builds Datalake2Go on AWS to analyze disparate data globally
Siemens is a technology company focused on industry, infrastructure, transport, and healthcare. From resource-efficient factories, resilient supply chains, and smart buildings and grids, to cleaner and more comfortable transportation and advanced healthcare, the company creates technology with purpose, adding real value for its customers. Siemens technology is everywhere, supporting the critical infrastructure and vital industries […]
Migrate on-premises data to AWS for insightful visualizations
When migrating data from on premises, customers seek a data store that is scalable, durable, and cost effective. Equally as important, BI must support modern, interactive, and fast dashboards that can scale to tens of thousands of users seamlessly while providing the ability to create meaningful data visualizations for analysis. Visualization of on-premises business analytics […]
Visualizing usage of Provisioned IOPS volumes on Amazon EBS for analysis
Organizations are always looking to right-size cloud infrastructure and optimize to cost. Historically, one of the areas where it has been difficult to right-size at scale are Provisioned IOPS volumes on Amazon EBS, as optimization usually required third-party tools. The recently announced AWS Compute Optimizer assists in solving that problem, as it helps customers optimize compute resources […]
Query Amazon S3 Analytics data with Amazon Athena
I recently had a customer explain that they were aware of the benefits of various Amazon S3 storage classes, like S3 Standard, S3 Infrequent-Access, and S3 One-Zone Infrequent-Access, but they were not sure which tiers and lifecycle rules to apply to optimize their storage. This customer, and others like them, have multiple buckets and various […]