AWS Big Data Blog
Introducing in-place version upgrades with Amazon MWAA
Today, AWS is announcing the availability of in-place version upgrades for Amazon Managed Workflow for Apache Airflow (Amazon MWAA). This enhancement allows you to seamlessly upgrade your existing Apache Airflow version 2.x environments to newer available versions while retaining the workflow run history and environment configurations. You can now take advantage of the latest capabilities […]
Advanced patterns with AWS SDK for pandas on AWS Glue for Ray
AWS SDK for pandas is a popular Python library among data scientists, data engineers, and developers. It simplifies interaction between AWS data and analytics services and pandas DataFrames. It allows easy integration and data movement between 22 types of data stores, including Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, and Amazon OpenSearch […]
Enable complex row-level security in embedded dashboards for non-provisioned users in Amazon QuickSight with OR-based tags
Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, both within QuickSight and embedded in your software as a service (SaaS) applications. QuickSight Enterprise edition started supporting nested conditions within row-level security […]
BWH Hotels scales enterprise business intelligence adoption while reducing costs with Amazon QuickSight
This is a guest post by Joseph Landucci from BWH Hotels. In their own words, “BWH Hotels is a leading, global hospitality enterprise comprised of three hotel companies, including WorldHotels, Best Western Hotels & Resorts and SureStay Hotels. Our mission is to deliver trusted guest experiences, drive hotel success and foster a caring, inclusive culture that respects […]
Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework
Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytic workloads. Customers are looking for tools that make it easier to migrate from other data warehouses, such as Google BigQuery, to Amazon Redshift to […]
Real-time inference using deep learning within Amazon Kinesis Data Analytics for Apache Flink
Apache Flink is a framework and distributed processing engine for stateful computations over data streams. Amazon Kinesis Data Analytics for Apache Flink is a fully managed service that enables you to use an Apache Flink application to process streaming data. The Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. […]
Configure Amazon OpenSearch Service for high availability
Amazon OpenSearch Service is a fully open-source search and analytics engine that securely unlocks real-time search, monitoring, and analysis of business and operational data for use cases like recommendation engines, ecommerce sites, and catalog search. To be successful in your business, you need your systems to be highly available and performant, minimizing downtime and avoiding […]
Trakstar unlocks new analytical opportunities for its HR customers with Amazon QuickSight
This is a guest post by Brian Kasen and Rebecca McAlpine from Trakstar, now a part of Mitratech. Trakstar, now a part of Mitratech, is a human resources (HR) software company that serves customers from small businesses and educational institutions to large enterprises, globally. Trakstar supercharges employee performance around pivotal moments in talent development. Our […]
Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB
Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets. Data lakes are not transactional by default; however, there […]
Automate alerting and reporting for AWS Glue job resource usage
Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset. Many organizations today are using AWS Glue to build ETL pipelines that bring data […]