AWS Big Data Blog
Automating deployment of Amazon Redshift ETL jobs with AWS CodeBuild, AWS Batch, and DBT
This post was last reviewed and updated June, 2022 to update the code and service used on the AWS CloudFormation template. Data has become an essential part of every business, and its volume, velocity, and variety continue to increase. This has resulted in more complex ETL jobs with interdependencies between each other. There is also […]
ICBiome uses Amazon QuickSight to empower hospitals in dealing with harmful pathogens
In response to the COVID-19 pandemic, hospitals and healthcare organizations are increasingly employing genetic sequencing to screen, track, and contain harmful pathogens. ICBiome is a startup that has been working on this problem for several years, creating innovative data analytics products using AWS to help hospitals and researchers address both community-associated and hospital-acquired infections. Building […]
Enabling multi-factor authentication for an Amazon Redshift cluster using Okta as an identity provider
December 2022: This post was reviewed and updated for accuracy. Many organizations have started using single sign-on (SSO) with multi-factor authentication (MFA) for enhanced security. This additional authentication factor is the new normal, which enhances the security provided by the user name and password model. Using SSO reduces the effort needed to maintain and remember […]
Unified serverless streaming ETL architecture with Amazon Kinesis Data Analytics
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Businesses across the world […]
Normalize data with Amazon Elasticsearch Service ingest pipelines
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Amazon OpenSearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost-effectively at scale. Search and log analytics are the two most popular use cases for Amazon OpenSearch Service. In log analytics […]
Enabling Amazon QuickSight federation with Microsoft Entra ID (formerly Azure AD)
June 2025: This post was reviewed and updated for accuracy. As of August 2023, Amazon QuickSight is now an AWS IAM Identity Center enabled application. This capability allows administrators who subscribe to QuickSight to use IAM Identity Center to enable their users to log in with Azure AD and other external identity providers. For more […]
Federating single sign-on access to your Amazon Redshift cluster with PingIdentity
Single sign-on (SSO) enables users to have a seamless user experience while accessing various applications in the organization. If you’re responsible for setting up security and database access privileges for users and tasked with enabling SSO for Amazon Redshift, you can set up SSO authentication using ADFS, PingIdentity, Okta, Azure AD or other SAML browser […]
How Cookpad scaled its Amazon Redshift cluster while controlling costs with usage limits
This is a guest post by Shimpei Kodama, data engineer at Cookpad Inc. Cookpad is a tech company that builds a community platform where people share recipe ideas and cooking tips. The company’s mission is to “make everyday cooking fun.” It’s one of the largest recipe-sharing platforms in Japan with over 50 million users per […]
Making ETL easier with AWS Glue Studio
AWS Glue Studio is an easy-to-use graphical interface that speeds up the process of authoring, running, and monitoring extract, transform, and load (ETL) jobs in AWS Glue. The visual interface allows those who don’t know Apache Spark to design jobs without coding experience and accelerates the process for those who do. AWS Glue Studio was […]
Automating bucketing of streaming data using Amazon Athena and AWS Lambda
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. In today’s world, data plays a vital role in helping businesses understand and improve their processes and services to reduce cost. You can use several tools to […]