AWS Big Data Blog

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. In particular, we have observed an increasing number of customers who combine and integrate their data into an Amazon Redshift data warehouse to analyze huge data at scale and […]

Get maximum value out of your cloud data warehouse with Amazon Redshift

Every day, customers are challenged with how to manage their growing data volumes and operational costs to unlock the value of data for timely insights and innovation, while maintaining consistent performance. Data creation, consumption, and storage are predicted to grow to 175 zettabytes by 2025, forecasted by the 2022 IDC Global DataSphere report. As data […]

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Data mesh is a new approach to data management. Companies across industries are using a data mesh to decentralize data management to improve data agility and get value from data. However, when a data producer shares data products on a data mesh self-serve web portal, it’s neither intuitive nor easy for a data consumer to […]

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Many customers run big data workloads such as extract, transform, and load (ETL) on Apache Hive to create a data warehouse on Hadoop. Apache Hive has performed pretty well for a long time. But with advancements in infrastructure such as cloud computing and multicore machines with large RAM, Apache Spark started to gain visibility by […]

Alexa Smart Properties creates value for hospitality, senior living, and healthcare properties with Amazon QuickSight Embedded

This is a guest post by Preet Jassi from Alexa Smart Properties. Alexa Smart Properties (ASP) is powered by a set of technologies that property owners, property managers, and third-party solution providers can use to deploy and manage Alexa-enabled devices at scale. Alexa can simplify tasks like playing music, controlling lights, or communicating with on-site […]

Configure SAML federation for Amazon OpenSearch Serverless with AWS IAM Identity Center

Amazon OpenSearch Serverless is a serverless option of Amazon OpenSearch Service that makes it easy for you to run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query responses for even the most demanding and […]

How CyberSolutions built a scalable data pipeline using Amazon EMR Serverless and the AWS Data Lab

This post is co-written by Constantin Scoarță and Horațiu Măiereanu from CyberSolutions Tech. CyberSolutions is one of the leading ecommerce enablers in Germany. We design, implement, maintain, and optimize award-winning ecommerce platforms end to end. Our solutions are based on best-in-class software like SAP Hybris and Adobe Experience Manager, and complemented by unique services that […]

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With EMR on EKS, Spark applications run on the Amazon EMR runtime for Apache Spark. This performance-optimized runtime offered by Amazon EMR makes your Spark jobs run fast […]

SANS Institute uses Amazon QuickSight to drive transformational security awareness maturity within organizations

This is a guest post by Carl Marrelli from SANS Institute. The SANS Institute is a world leader in cybersecurity training and certification. For over 30 years, SANS has worked with leading organizations to help ensure security across their organization, as well as with individual IT professionals who want to build and grow their security […]

Reference guide to build inventory management and forecasting solutions on AWS

Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them. The consequences of poor inventory management can be severe. Overstocking can lead to […]