AWS Big Data Blog

Build a data lake using Amazon Kinesis Data Streams for Amazon DynamoDB and Apache Hudi

Amazon DynamoDB helps you capture high-velocity data such as clickstream data to form customized user profiles and online order transaction data to develop customer order fulfillment applications, improve customer satisfaction, and get insights into sales revenue to create a promotional offer for the customer. It’s essential to store these data points in a centralized data […]

Read More

Amazon EMR 2020 year in review

Tens of thousands of customers use Amazon EMR to run big data analytics applications on Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto at scale. Amazon EMR automates the provisioning and scaling of these frameworks, and delivers high performance at low cost with optimized runtimes and support for a wide range […]

Read More

Effective data lakes using AWS Lake Formation, Part 1: Getting started with governed tables

Thousands of customers are building their data lakes on Amazon Simple Storage Service (Amazon S3). You can use AWS Lake Formation to build your data lakes easily—in a matter of days as opposed to months. However, there are still some difficult challenges to address with your data lakes: Supporting streaming updates and deletes in your data […]

Read More

Monitor data quality in your data lake using PyDeequ and AWS Glue

In our previous post, we introduced PyDeequ, an open-source Python wrapper over Deequ, which enables you to write unit tests on your data to ensure data quality. The use case we ran through was on static, historical data, but most datasets are dynamic, so how can you quantify how your data is changing and detect […]

Read More
The following diagram illustrates this architecture.

Run usage analytics on Amazon QuickSight using AWS CloudTrail

Amazon QuickSight is a cloud-native BI service that allows end users to create and publish dashboards in minutes, without provisioning any servers or requiring complex licensing. You can view these dashboards on the QuickSight product console or embed them into applications and websites. After you deploy a dashboard, it’s important to assess how they and […]

Read More
The following diagram illustrates the architecture of this intermediate pipeline to generate training data.

Retaining data streams up to one year with Amazon Kinesis Data Streams

Streaming data is used extensively for use cases like sharing data between applications, streaming ETL (extract, transform, and load), real-time analytics, processing data from internet of things (IoT) devices, application monitoring, fraud detection, live leaderboards, and more. Typically, data streams are stored for short durations of time before being loaded into a permanent data store […]

Read More
Example Corp program managers can now monitor slack engagement using their QuickSight Dashboard

Create a custom data connector to Slack’s Member Analytics API in Amazon QuickSight with Amazon Athena Federated Query

Amazon QuickSight recently added support for Amazon Athena Federated Query, which allows you to query data in place from various data sources. With this capability, QuickSight can extend support to query additional data sources like Amazon CloudWatch Logs, Amazon DynamoDB, and Amazon DocumentDB (with Mongo DB compatibility) via their existing Amazon Athena data source. You […]

Read More
The following diagram shows the flow of our solution.

Integrating Datadog data with AWS using Amazon AppFlow for intelligent monitoring

Infrastructure and operation teams are often challenged with getting a full view into their IT environments to do monitoring and troubleshooting. New monitoring technologies are needed to provide an integrated view of all components of an IT infrastructure and application system. Datadog provides intelligent application and service monitoring by bringing together data from servers, databases, […]

Read More

Building an administrative console in Amazon QuickSight to analyze usage metrics

Given the scalability of Amazon QuickSight to hundreds and thousands of users, a common use case is to monitor QuickSight group and user activities, analyze the utilization of dashboards, and identify usage patterns of an individual user and dashboard. With timely access to interactive usage metrics, business intelligence (BI) administrators and data team leads can […]

Read More
Here is a component overview:

Getting started with Trace Analytics in Amazon Elasticsearch Service

 Updated May 11, 2021. See the release notes below for more details. Trace Analytics is now available for Amazon Elasticsearch Service (Amazon ES) domains running versions 7.9 or later. Developers and IT Ops teams can use this feature to troubleshoot performance and availability issues in their distributed applications. It provides end-to-end insights that aren’t possible […]

Read More