AWS Big Data Blog
Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog
AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of […]
Deep dive on Amazon MSK tiered storage
In the first post of the series, we described some core concepts of Apache Kafka cluster sizing, the best practices for optimizing the performance, and the cost of your Kafka workload. This post explains how the underlying infrastructure affects Kafka performance when you use Amazon Managed Streaming for Apache Kafka (Amazon MSK) tiered storage. We […]
How SumUp made digital analytics more accessible using AWS Glue
This is a guest blog post by Mira Daniels and Sean Whitfield from SumUp. SumUp is a leading global financial technology company driven by the purpose of leveling the playing field for small businesses. Founded in 2012, SumUp is the financial partner for more than 4 million small merchants in over 35 markets worldwide, helping […]
Introducing in-place version upgrades with Amazon MWAA
Today, AWS is announcing the availability of in-place version upgrades for Amazon Managed Workflow for Apache Airflow (Amazon MWAA). This enhancement allows you to seamlessly upgrade your existing Apache Airflow version 2.x environments to newer available versions while retaining the workflow run history and environment configurations. You can now take advantage of the latest capabilities […]
Advanced patterns with AWS SDK for pandas on AWS Glue for Ray
September 2023: This post was reviewed and updated with a new dataset and related code blocks and images. AWS SDK for pandas is a popular Python library among data scientists, data engineers, and developers. It simplifies interaction between AWS data and analytics services and pandas DataFrames. It allows easy integration and data movement between 22 […]
Enable complex row-level security in embedded dashboards for non-provisioned users in Amazon QuickSight with OR-based tags
Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, both within QuickSight and embedded in your software as a service (SaaS) applications. QuickSight Enterprise edition started supporting nested conditions within row-level security […]
BWH Hotels scales enterprise business intelligence adoption while reducing costs with Amazon QuickSight
This is a guest post by Joseph Landucci from BWH Hotels. In their own words, “BWH Hotels is a leading, global hospitality enterprise comprised of three hotel companies, including WorldHotels, Best Western Hotels & Resorts and SureStay Hotels. Our mission is to deliver trusted guest experiences, drive hotel success and foster a caring, inclusive culture that respects […]
Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework
Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytic workloads. Customers are looking for tools that make it easier to migrate from other data warehouses, such as Google BigQuery, to Amazon Redshift to […]
Real-time inference using deep learning within Amazon Managed Service for Apache Flink
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Apache Flink is a framework and distributed processing engine for stateful computations over data streams. Amazon Managed Service for Apache Flink is a fully managed service that […]
Configure Amazon OpenSearch Service for high availability
Amazon OpenSearch Service is a fully open-source search and analytics engine that securely unlocks real-time search, monitoring, and analysis of business and operational data for use cases like recommendation engines, ecommerce sites, and catalog search. To be successful in your business, you need your systems to be highly available and performant, minimizing downtime and avoiding […]