AWS Big Data Blog

The architecture

Federated access to Amazon Redshift clusters in AWS China Regions with Active Directory Federation Services

Many customers already manage user identities through identity providers (IdPs) for single sign-on access. With an IdP such as Active Directory Federation Services (AD FS), you can set up federated access to Amazon Redshift clusters as a mechanism to control permissions for the database objects by business groups. This provides a seamless user experience, and centralizes the governance […]

Accelerate your data warehouse migration to Amazon Redshift – Part 5

This is the fifth in a series of posts. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially simplify your migrations from legacy data warehouses to Amazon Redshift. Check out the all the posts in this series: […]

Back up and restore Kafka topic data using Amazon MSK Connect

This blog is only meant to be used as a reference for backing up and restoring data for an Amazon MSK cluster. AWS does not offer any support for it. You can use Apache Kafka to run your streaming workloads. Kafka provides resiliency to failures and protects your data out of the box by replicating […]

Migrate your Amazon Redshift cluster to another AWS Region

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS designed hardware and machine […]

Comparing throughput and put latencies of different broker sizes

Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost

Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how […]

Build a cost-effective extension to your Elasticsearch cluster with Amazon OpenSearch Service

During the past year, we’ve seen customers running self-managed Elasticsearch clusters on AWS who were running out of compute and storage capacity because of the non-elasticity of their clusters. They adopted Amazon OpenSearch Service. to benefit from better flexibility for their logs and enhanced retention periods. In this post, we discuss how to build a […]

Ingest data from Snowflake to Amazon S3 using AWS Glue Marketplace Connectors

In today’s complex business landscape, organizations are challenged to consume from variety of sources and keep up with data that pours in all through the day. There is a demand to design applications that enables data to be portable across cloud platforms and give them the ability to derive insights from one or more data […]

Make data available for analysis in seconds with Upsolver low-code data pipelines, Amazon Redshift Streaming Ingestion, and Amazon Redshift Serverless

Amazon Redshift is the most widely used cloud data warehouse. Amazon Redshift makes it easy and cost-effective to perform analytics on vast amounts of data. Amazon Redshift launched Streaming Ingestion for Amazon Kinesis Data Streams, which enables you to load data into Amazon Redshift with low latency and without having to stage the data in […]

Solution Architecture

Build and deploy custom connectors for Amazon Redshift with Amazon Lookout for Metrics

Amazon Lookout for Metrics detects outliers in your time series data, determines their root causes, and enables you to quickly take action. Built from the same technology used by Amazon.com, Lookout for Metrics reflects 20 years of expertise in outlier detection and machine learning (ML). Read our GitHub repo to learn more about how to […]

Architecture Diagram

Query and visualize Amazon Redshift operational metrics using the Amazon Redshift plugin for Grafana

Grafana is a rich interactive open-source tool by Grafana Labs for visualizing data across one or many data sources. It’s used in a variety of modern monitoring stacks, allowing you to have a common technical base and apply common monitoring practices across different systems. Amazon Managed Grafana is a fully managed, scalable, and secure Grafana-as-a-service […]