AWS Big Data Blog

Category: Analytics

Solution Architecture for the blog post

Simplify and optimize Python package management for AWS Glue PySpark jobs with AWS CodeArtifact

Data engineers use various Python packages to meet their data processing requirements while building data pipelines with AWS Glue PySpark Jobs. Languages like Python and Scala are commonly used in data pipeline development. Developers can take advantage of their open-source packages or even customize their own to make it easier and faster to perform use […]

Build a multilingual dashboard with Amazon Athena and Amazon QuickSight

Amazon QuickSight is a serverless business intelligence (BI) service used by organizations of any size to make better data-driven decisions. QuickSight dashboards can also be embedded into SaaS apps and web portals to provide interactive dashboards, natural language query or data analysis capabilities to app users seamlessly. The QuickSight Demo Central contains many dashboards, feature showcase […]

Introduction to Amazon QuickSight ML Insights

Amazon QuickSight was launched in November 2016 as a fast, cloud-powered business analytics service to build visualizations, perform ad hoc analysis, and quickly get business insights from a variety of data sources. In 2018, ML Insights for QuickSight (Enterprise Edition) was announced to add machine learning (ML)-powered forecasting and anomaly detection with a few clicks. […]

A serverless operational data lake for retail with AWS Glue, Amazon Kinesis Data Streams, Amazon DynamoDB, and Amazon QuickSight

Do you want to reduce stockouts at stores? Do you want to improve order delivery timelines? Do you want to provide your customers with accurate product availability, down to the millisecond? A retail operational data lake can help you transform the customer experience by providing deeper insights into a variety of operational aspects of your […]

Automate your validated dataset deployment using Amazon QuickSight and AWS CloudFormation

A lot of the power behind business intelligence (BI) and data visualization tools such as Amazon QuickSight comes from the ability to work interactively with data through a GUI. Report authors create dashboards using GUI-based tools, then in just a few clicks can share the dashboards with business users and decision-makers. This workflow empowers authors […]

Trigger an AWS Glue DataBrew job based on an event generated from another DataBrew job

Organizations today have continuous incoming data, and analyzing this data in a timely fashion is becoming a common requirement for data analytics and machine learning (ML) use cases. As part of this, you need clean data in order to gain insights that can enable enterprises to get the most out of their data for business […]

Integrate AWS Glue Schema Registry with the AWS Glue Data Catalog to enable effective schema enforcement in streaming analytics use cases

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Metadata is an integral part of data management and governance. The AWS Glue Data Catalog can provide a uniform repository to store and share metadata. The main […]

Supercharging Dream11’s Data Highway with Amazon Redshift RA3 clusters

This is a guest post by Dhanraj Gaikwad, Principal Engineer on Dream11 Data Engineering team. Dream11 is the world’s largest fantasy sports platform, with over 120 million users playing fantasy cricket, football, kabaddi, basketball, hockey, volleyball, handball, rugby, futsal, American football, and baseball. Dream11 is the flagship brand of Dream Sports, India’s leading Sports Technology […]

Visualize MongoDB data from Amazon QuickSight using Amazon Athena Federated Query

In this post, you will learn how to use Amazon Athena Federated Query to connect a MongoDB database to Amazon QuickSight in order to build dashboards and visualizations. Amazon Athena is a serverless interactive query service, based on Presto, that provides full ANSI SQL support to query a variety of standard data formats, including CSV, […]

Synchronize your AWS Glue Studio Visual Jobs to different environments 

June 2023: This post was reviewed and updated for accuracy. AWS Glue has become a popular option for integrating data from disparate data sources due to its ability to integrate large volumes of data using distributed data processing frameworks. Many customers use AWS Glue to build data lakes and data warehouses. Data engineers who prefer […]