AWS Big Data Blog

Category: AWS Big Data

Scale your cloud data warehouse and reduce costs with the new Amazon Redshift RA3 nodes with managed storage

One of our favorite things about working on Amazon Redshift, the cloud data warehouse service at AWS, is the inspiring stories from customers about how they’re using data to gain business insights. Many of our recent engagements have been with customers upgrading to the new instance type, Amazon Redshift RA3 with managed storage. In this […]

Read More

Enhancing customer safety by leveraging the scalable, secure, and cost-optimized Toyota Connected Data Lake

Toyota Motor Corporation (TMC), a global automotive manufacturer, has made “connected cars” a core priority as part of its broader transformation from an auto company to a mobility company. In recent years, TMC and its affiliate technology and big data company, Toyota Connected, have developed an array of new technologies to provide connected services that […]

Read More

Optimize Python ETL by extending Pandas with AWS Data Wrangler

Developing extract, transform, and load (ETL) data pipelines is one of the most time-consuming steps to keep data lakes, data warehouses, and databases up to date and ready to provide business insights. You can categorize these pipelines into distributed and non-distributed, and the choice of one or the other depends on the amount of data […]

Read More

Integrating the MongoDB Cloud with Amazon Kinesis Data Firehose

With the release of Kinesis Data Firehose HTTP endpoint delivery, you can now stream your data through Amazon Kinesis or directly push data to Kinesis Data Firehose and configure it to deliver data to MongoDB Atlas. You can also configure Kinesis Data Firehose to transform the data before delivering it to its destination. You don’t have to write applications and manage resources to read data and push to MongoDB. It’s all managed by AWS, making it easier to estimate costs for your data based on your data volume. In this post, we discuss how to integrate Kinesis Data Firehose and MongoDB Cloud and demonstrate how to stream data from your source to MongoDB Atlas.

Read More

Creating customized Vega visualizations in Amazon Elasticsearch Service

This post shows how to implement Vega visualizations included in Kibana, which is part of Amazon Elasticsearch Service (Amazon ES), using a real-world clickstream data sample. Vega visualizations are an integrated scripting mechanism of Kibana to perform on-the-fly computations on raw data to generate D3.js visualizations. For this post, we use a fully automated setup using AWS CloudFormation to show how to build a customized histogram for a web analytics use case. This example implements an ad hoc map-reduce like aggregation of the underlying data for a histogram.

Read More

Monitor and Optimize Analytic Workloads on Amazon EMR with Prometheus and Grafana

This post discusses installing and configuring Prometheus and Grafana on an Amazon Elastic Compute Cloud (Amazon EC2) instance, configuring an EMR cluster to emit metrics that Prometheus can scrape from the cluster, and using the Grafana dashboards to analyze the metrics for a workload on the EMR cluster and optimize it. Additionally, we also cover how Prometheus can push alerts to the Alertmanager, and configuring Amazon SNS to send email notifications.

Read More

Vortexa delivers real-time insights on Amazon MSK with Lenses.io

This post discusses how Vortexa harnesses the power of Apache Kafka to improve real-time data accuracy and accelerate time-to-market by using a combination of Lenses.io for greater observability and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to create clusters on demand.

Read More

Build a distributed big data reconciliation engine using Amazon EMR and Amazon Athena

This is a guest post by Sara Miller, Head of Data Management and Data Lake, Direct Energy; and Zhouyi Liu, Senior AWS Developer, Direct Energy. Enterprise companies like Direct Energy migrate on-premises data warehouses and services to AWS to achieve fully manageable digital transformation of their organization. Freedom from traditional data warehouse constraints frees up […]

Read More

Embed multi-tenant analytics in applications with Amazon QuickSight

Amazon QuickSight recently introduced four new features—embedded authoring, namespaces for multi-tenancy, custom user permissions, and account-level customizations—that, with existing dashboard embedding and API capabilities available in the Enterprise Edition, allow you to integrate advanced dashboarding and analytics capabilities within SaaS applications. Developers and independent software vendors (ISVs) who build these applications can now offer embedded, […]

Read More