AWS Big Data Blog

Category: Networking & Content Delivery

High level architecture

Scale AWS Glue jobs by optimizing IP address consumption and expanding network capacity using a private NAT gateway

As businesses expand, the demand for IP addresses within the corporate network often exceeds the supply. An organization’s network is often designed with some anticipation of future requirements, but as enterprises evolve, their information technology (IT) needs surpass the previously designed network. Companies may find themselves challenged to manage the limited pool of IP addresses. […]

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. It’s common to store the logs generated by customer’s applications and services in various tools. These logs are important for compliance, audits, troubleshooting, security incident responses, meeting security policies, and many other […]

Stream VPC flow logs to Amazon OpenSearch Service via Amazon Kinesis Data Firehose

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Amazon Virtual Private Cloud (Amazon VPC) flow logs enable you to track the IP traffic going to and from the network interfaces in your VPC for your workloads. Analyzing VPC logs helps […]

Simplify private network access for solutions using Amazon OpenSearch Service managed VPC endpoints

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. Amazon OpenSearch is an open source, distributed search and analytics suite. Amazon OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), as well as visualization capabilities […]

Quiclsight-VPC-Peering-Deployment-Architecture

Amazon QuickSight deployment models for cross-account and cross-Region access to Amazon Redshift and Amazon RDS

Many AWS customers use multiple AWS accounts and Regions across different departments and applications within the same company. However, you might deploy services like Amazon QuickSight using a single-account approach to centralize users, data source access, and dashboard management. This post explores how you can use different Amazon Virtual Private Cloud (Amazon VPC) private connectivity features to connect QuickSight […]

How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink

August 2023: Amazon MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. This guest post presents patterns for accessing an Amazon Managed Streaming for Apache Kafka cluster across your AWS account or Amazon Virtual Private Cloud (Amazon VPC) […]

Connect to and run ETL jobs across multiple VPCs using a dedicated AWS Glue VPC

In this blog post, we’ll go through the steps needed to build an ETL pipeline that consumes from one source in one VPC and outputs it to another source in a different VPC. We’ll set up in multiple VPCs to reproduce a situation where your database instances are in multiple VPCs for isolation related to security, audit, or other purposes.

Create data science environments on AWS for health analysis using OHDSI

This blog post demonstrates how to combine some of the OHDSI projects (Atlas, Achilles, WebAPI, and the OMOP Common Data Model) with AWS technologies. By doing so, you can quickly and inexpensively implement a health data science and informatics environment.