AWS Big Data Blog
Category: Analytics
Federated access to Amazon Redshift clusters in AWS China Regions with Active Directory Federation Services
Many customers already manage user identities through identity providers (IdPs) for single sign-on access. With an IdP such as Active Directory Federation Services (AD FS), you can set up federated access to Amazon Redshift clusters as a mechanism to control permissions for the database objects by business groups. This provides a seamless user experience, and centralizes the governance […]
Accelerate your data warehouse migration to Amazon Redshift – Part 5
This is the fifth in a series of posts. We’re excited to share dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially simplify your migrations from legacy data warehouses to Amazon Redshift. Check out the all the posts in this series: […]
Back up and restore Kafka topic data using Amazon MSK Connect
This blog is only meant to be used as a reference for backing up and restoring data for an Amazon MSK cluster. AWS does not offer any support for it. You can use Apache Kafka to run your streaming workloads. Kafka provides resiliency to failures and protects your data out of the box by replicating […]
Migrate your Amazon Redshift cluster to another AWS Region
Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS designed hardware and machine […]
Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost
Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how […]
Build a cost-effective extension to your Elasticsearch cluster with Amazon OpenSearch Service
During the past year, we’ve seen customers running self-managed Elasticsearch clusters on AWS who were running out of compute and storage capacity because of the non-elasticity of their clusters. They adopted Amazon OpenSearch Service. to benefit from better flexibility for their logs and enhanced retention periods. In this post, we discuss how to build a […]
Make data available for analysis in seconds with Upsolver low-code data pipelines, Amazon Redshift Streaming Ingestion, and Amazon Redshift Serverless
Amazon Redshift is the most widely used cloud data warehouse. Amazon Redshift makes it easy and cost-effective to perform analytics on vast amounts of data. Amazon Redshift launched Streaming Ingestion for Amazon Kinesis Data Streams, which enables you to load data into Amazon Redshift with low latency and without having to stage the data in […]
Build and deploy custom connectors for Amazon Redshift with Amazon Lookout for Metrics
Amazon Lookout for Metrics detects outliers in your time series data, determines their root causes, and enables you to quickly take action. Built from the same technology used by Amazon.com, Lookout for Metrics reflects 20 years of expertise in outlier detection and machine learning (ML). Read our GitHub repo to learn more about how to […]
Query and visualize Amazon Redshift operational metrics using the Amazon Redshift plugin for Grafana
Grafana is a rich interactive open-source tool by Grafana Labs for visualizing data across one or many data sources. It’s used in a variety of modern monitoring stacks, allowing you to have a common technical base and apply common monitoring practices across different systems. Amazon Managed Grafana is a fully managed, scalable, and secure Grafana-as-a-service […]
Build a serverless pipeline to analyze streaming data using AWS Glue, Apache Hudi, and Amazon S3
Organizations typically accumulate massive volumes of data and continue to generate ever-exceeding data volumes, ranging from terabytes to petabytes and at times to exabytes of data. Such data is usually generated in disparate systems and requires an aggregation into a single location for analysis and insight generation. A data lake architecture allows you to aggregate […]