AWS Big Data Blog

How NortonLifelock built a serverless architecture for real-time analysis of their VPN usage metrics

This post presents a reference architecture and optimization strategies for building serverless data analytics solutions on AWS using Amazon Kinesis Data Analytics. In addition, this post shows the design approach that the engineering team at NortonLifeLock took to build out an operational analytics platform that processes usage data for their VPN services, consuming petabytes of […]

Read More

Configure Amazon EMR Studio and Amazon EKS to run notebooks with Amazon EMR on EKS

Amazon EMR on Amazon EKS provides a deployment option for Amazon EMR that allows you to run analytics workloads on Amazon Elastic Kubernetes Service (Amazon EKS). This is an attractive option because it allows you to run applications on a common pool of resources without having to provision infrastructure. In addition, you can use Amazon […]

Read More

Integral Ad Science secures self-service data lake using AWS Lake Formation

This post is co-written with Mat Sharpe, Technical Lead, AWS & Systems Engineering from Integral Ad Science. Integral Ad Science (IAS) is a global leader in digital media quality. The company’s mission is to be the global benchmark for trust and transparency in digital media quality for the world’s leading brands, publishers, and platforms. IAS […]

Read More

Accelerate your data warehouse migration to Amazon Redshift – Part 4

This is the fourth in a series of posts. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially reduce your overall cost to migrate to Amazon Redshift. Check out the previous posts in this series: Accelerate your […]

Read More

Accelerate your data warehouse migration to Amazon Redshift – Part 3

This is the third post in a multi-part series. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and reduce your overall cost to migrate to Amazon Redshift. Check out all the posts in this series: Accelerate your data […]

Read More

Reduce costs and increase resource utilization of Apache Spark jobs on Kubernetes with Amazon EMR on Amazon EKS

Amazon EMR on Amazon EKS is a deployment option for Amazon EMR that allows you to run Apache Spark on Amazon Elastic Kubernetes Service (Amazon EKS). If you run open-source Apache Spark on Amazon EKS, you can now use Amazon EMR to automate provisioning and management, and run Apache Spark up to three times faster. […]

Read More
Rapid-7 Multi-tenant Architecture

How Rapid7 built multi-tenant analytics with Amazon Redshift using near-real-time datasets

This is a guest post co-written by Rahul Monga, Principal Software Engineer at Rapid7. Rapid7 InsightVM is a vulnerability assessment and management product that provides visibility into the risks present across an organization. It equips you with the reporting, automation, and integrations needed to prioritize and fix those vulnerabilities in a fast and efficient manner. […]

Read More

Analyze daily trading activity using transaction data from Amazon Redshift in Amazon FinSpace

Financial services organizations use data from various sources to discover new insights and improve trading decisions. Finding the right dataset and getting access to the data can frequently be a time-consuming process. For example, to analyze daily trading activity, analysts need to find a list of available databases and tables, identify its owner’s contact information, […]

Read More

Register now for Flink Forward Global, October 26-27, 2021

Flink Forward Global 2021 is a 2-day virtual conference for the Apache Flink and stream processing communities. Apache Flink is an open-source distributed engine for processing data streams that can support both streaming and batch workloads. Amazon Kinesis Data Analytics is a fully managed service for Apache Flink on AWS that reduces the complexity of […]

Read More

Embed multi-tenant dashboards in SaaS apps using Amazon QuickSight without provisioning or managing users

Amazon QuickSight is a fully-managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either within QuickSight itself, or embedded in software as a service (SaaS) apps. QuickSight Enterprise Edition recently added row-level security (RLS) using tags, a […]

Read More