AWS Big Data Blog

Build your data pipeline in your AWS modern data platform using AWS Lake Formation, AWS Glue, and dbt Core

dbt has established itself as one of the most popular tools in the modern data stack, and is aiming to bring analytics engineering to everyone. The dbt tool makes it easy to develop and implement complex data processing pipelines, with mostly SQL, and it provides developers with a simple interface to create, test, document, evolve, […]

Read More

Amazon QuickSight 1-click public embedding

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either directly within a QuickSight application, or embedded in web apps and portals. QuickSight Enterprise Edition now supports 1-click public embedding, a feature […]

Read More

Introducing AWS Glue Auto Scaling: Automatically resize serverless computing resources for lower cost with optimized Apache Spark

Data created in the cloud is growing fast in recent days, so scalability is a key factor in distributed data processing. Many customers benefit from the scalability of the AWS Glue serverless Spark runtime. Today, we’re pleased to announce the release of AWS Glue Auto Scaling, which helps you scale your AWS Glue Spark jobs […]

Read More

Enhance analytics with Google Trends data using AWS Glue, Amazon Athena, and Amazon QuickSight

In today’s market, business success often lies in the ability to glean accurate insights and predictions from data. However, data scientists and analysts often find that the data they have at their disposal isn’t enough to help them make accurate predictions for their use cases. A variety of factors might alter an outcome and should […]

Read More

Scale Amazon Redshift to meet high throughput query requirements

Many enterprise customers have demanding query throughput requirements for their data warehouses. Some may be able to address these requirements through horizontally or vertically scaling a single cluster. Others may have a short duration where they need extra capacity to handle peaks that can be addressed through Amazon Redshift concurrency scaling. However, enterprises with consistently […]

Read More

Amazon Redshift continues its price-performance leadership

Data is a strategic asset. Getting timely value from data requires high-performance systems that can deliver performance at scale while keeping costs low. Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. We continue to add new capabilities to […]

Read More

Automate notifications on Slack for Amazon Redshift query monitoring rule violations

In this post, we walk you through how to set up automatic notifications of query monitoring rule (QMR) violations in Amazon Redshift to a Slack channel, so that Amazon Redshift users can take timely action. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze your […]

Read More

Share data securely across Regions using Amazon Redshift data sharing

Today’s global, data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This requires you to seamlessly share and consume live, consistent data as a single source of truth without copying the data, regardless of where LOB users are located. Amazon […]

Read More

Write prepared data directly into JDBC-supported destinations using AWS Glue DataBrew

AWS Glue DataBrew offers over 250 pre-built transformations to automate data preparation tasks (such as filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks writing hand-coded transformations. You can now write cleaned and normalized data directly into JDBC-supported databases and data warehouses without having to move large amounts of […]

Read More

Develop and test AWS Glue version 3.0 jobs locally using a Docker container

AWS Glue is a fully managed serverless service that allows you to process data coming through different data sources at scale. You can use AWS Glue jobs for various use cases such as data ingestion, preprocessing, enrichment, and data integration from different data sources. AWS Glue version 3.0, the latest version of AWS Glue Spark […]

Read More