AWS Big Data Blog

How Viasat scaled their big data applications by migrating to Amazon EMR

This post is co-written with Manoj Gundawar from Viasat. Viasat is a satellite internet service provider based in Carlsbad, CA, with operations across the United States and worldwide. Viasat’s ambition is to be the first truly global, scalable, broadband service provider with a mission to deliver connections that can change the world. Viasat operates across […]

Read More

Continuous monitoring with Sumo Logic using Amazon Kinesis Data Firehose HTTP endpoints

Amazon Kinesis Data Firehose streams data to AWS destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service). Additionally, Kinesis Data Firehose supports destinations to third-party partners. This ability to send data to third-party partners is a vital feature for customers who already use these […]

Read More

Build and orchestrate ETL pipelines using Amazon Athena and AWS Step Functions

Extract, transform, and load (ETL) is the process of reading source data, applying transformation rules to this data, and loading it into the target structures. ETL is performed for various reasons. Sometimes ETL helps align source data to target data structures, whereas other times ETL is done to derive business value by cleansing, standardizing, combining, […]

Read More

Implement a slowly changing dimension in Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. A star schema is a database organization structure optimized for use in a data warehouse. In a star schema, a dimension is a structure that categorizes the facts and measures in order to enable you to answer business questions. The attributes (or […]

Read More

Prepare, transform, and orchestrate your data using AWS Glue DataBrew, AWS Glue ETL, and AWS Step Functions

Data volumes in organizations are increasing at an unprecedented rate, exploding from terabytes to petabytes and in some cases exabytes. As data volume increases, it attracts more and more users and applications to use the data in many different ways—sometime referred to as data gravity. As data gravity increases, we need to find tools and […]

Read More

WeatherBug reduced ETL latency to 30 times faster using Amazon Redshift Spectrum

This post is co-written with data engineers, Anton Morozov and James Phillips, from Weatherbug. WeatherBug is a brand owned by GroundTruth, based in New York City, that provides location-based advertising solutions to businesses. WeatherBug consists of a mobile app reporting live and forecast data on hyperlocal weather to consumer users. The WeatherBug Data Engineering team […]

Read More

Automate your Amazon Redshift performance tuning with automatic table optimization

Amazon Redshift is a cloud data warehouse database that provides fast, consistent performance running complex analytical queries on huge datasets scaling into petabytes and even exabytes with Amazon Redshift Spectrum. Although Amazon Redshift has excellent query performance out of the box, with up to three times better price performance than other cloud data warehouses, you […]

Read More

Query your Amazon MSK topics interactively using Amazon Kinesis Data Analytics Studio

Amazon Kinesis Data Analytics Studio makes it easy to analyze streaming data in real time and build stream processing applications powered by Apache Flink using standard SQL, Python, and Scala. With a few clicks on the AWS Management Console, you can launch a serverless notebook to query data streams and get results in seconds. Kinesis […]

Read More

Authorize SparkSQL data manipulation on Amazon EMR using Apache Ranger

With Amazon EMR 5.32, Amazon EMR introduced Apache Ranger 2.0 support, which allows you to enable authorization and audit capabilities for Apache Spark, Amazon Simple Storage Service (Amazon S3), and Apache Hive. It also enabled authorization audits to be logged in Amazon CloudWatch. However, although you could control Apache Spark writes to Amazon S3 with […]

Read More
athena-quicksight-cross-account-architecture

Use Amazon Athena and Amazon QuickSight in a cross-account environment

Many AWS customers use a multi-account strategy to host applications for different departments within the same company. However, you might deploy services like Amazon QuickSight using a single-account approach, which raises challenges when you need to use QuickSight in combination with Amazon Athena to build reports and dashboards. With the recently announced built-in support for cross-account […]

Read More