AWS Big Data Blog
Category: Advanced (300)
Securing Apache Kafka is easy and familiar with IAM Access Control for Amazon MSK
September 2025: This post was reviewed and updated for accuracy. AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates the need […]
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
At re:Invent 2020, we announced the general availability of Amazon EMR on Amazon EKS, a new deployment option for Amazon EMR that allows you to automate the provisioning and management of open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With Amazon EMR on EKS, you can now run Spark applications alongside other […]
Using the Amazon Redshift Data API to interact from an Amazon SageMaker Jupyter notebook
June 2023: This post was reviewed for accuracy. The Amazon Redshift Data API makes it easy for any application written in Python, Go, Java, Node.JS, PHP, Ruby, and C++ to interact with Amazon Redshift. Traditionally, these applications use JDBC connectors to connect, send a query to run, and retrieve results from the Amazon Redshift cluster. […]
Using the Amazon Redshift Data API to interact with Amazon Redshift clusters
June 2023: This post was reviewed and updated for accuracy. July 2021: This post was reviewed and updated to include multi-statement and parameterization support. Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL […]


