AWS Big Data Blog

Category: Compute

How SailPoint solved scaling issues by migrating legacy big data applications to Amazon EMR on Amazon EKS

This post is co-written with Richard Li from SailPoint. SailPoint Technologies is an identity security company based in Austin, TX. Its software as a service (SaaS) solutions support identity governance operations in regulated industries such as healthcare, government, and higher education. SailPoint distinguishes multiple aspects of identity as individual identity security services, including cloud governance, […]

Read More

Develop and test AWS Glue version 3.0 jobs locally using a Docker container

AWS Glue is a fully managed serverless service that allows you to process data coming through different data sources at scale. You can use AWS Glue jobs for various use cases such as data ingestion, preprocessing, enrichment, and data integration from different data sources. AWS Glue version 3.0, the latest version of AWS Glue Spark […]

Read More

Improved performance with AWS Graviton2 instances on Amazon OpenSearch Service

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) is a fully managed service at AWS for OpenSearch. It’s an open-source search and analytics suite used for a broad set of use cases, like real-time application monitoring, log analytics, and website search. While running an OpenSearch Service domain, you can choose from a variety of instances […]

Read More
Architecture Diagram

Audit AWS service events with Amazon EventBridge and Amazon Kinesis Data Firehose

Amazon EventBridge is a serverless event bus that makes it easy to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS services. Many AWS services generate EventBridge events. When an AWS service in your account emits an event, it goes to your account’s default […]

Read More
Featured Stateful Architecture

Doing more with less: Moving from transactional to stateful batch processing

Amazon processes hundreds of millions of financial transactions each day, including accounts receivable, accounts payable, royalties, amortizations, and remittances, from over a hundred different business entities. All of this data is sent to the eCommerce Financial Integration (eCFI) systems, where they are recorded in the subledger. Ensuring complete financial reconciliation at this scale is critical […]

Read More

Build a modern data architecture on AWS with Amazon AppFlow, AWS Lake Formation, and Amazon Redshift: Part 2

In Part 1 of this post, we provided a solution to build the sourcing, orchestration, and transformation of data from multiple source systems, including Salesforce, SAP, and Oracle, into a managed modern data platform. Roche partnered with AWS Professional Services to build out this fully automated and scalable platform to provide the foundation for their […]

Read More

Unify log aggregation and analytics across compute platforms

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we […]

Read More

Optimize performance and reduce costs for network analytics with VPC Flow Logs in Apache Parquet format

VPC Flow Logs help you understand network traffic patterns, identify security issues, audit usage, and diagnose network connectivity on AWS. Customers often route their VPC flow logs directly to Amazon Simple Storage Service (Amazon S3) for long-term retention. You can then use a custom format conversion application to convert these text files into an Apache […]

Read More
Quiclsight-VPC-Peering-Deployment-Architecture

Amazon QuickSight deployment models for cross-account and cross-Region access to Amazon Redshift and Amazon RDS

Many AWS customers use multiple AWS accounts and Regions across different departments and applications within the same company. However, you might deploy services like Amazon QuickSight using a single-account approach to centralize users, data source access, and dashboard management. This post explores how you can use different Amazon Virtual Private Cloud (Amazon VPC) private connectivity features to connect QuickSight […]

Read More

How NortonLifelock built a serverless architecture for real-time analysis of their VPN usage metrics

This post presents a reference architecture and optimization strategies for building serverless data analytics solutions on AWS using Amazon Kinesis Data Analytics. In addition, this post shows the design approach that the engineering team at NortonLifeLock took to build out an operational analytics platform that processes usage data for their VPN services, consuming petabytes of […]

Read More