Posted On: Aug 1, 2023

AWS open sourced the Amazon Redshift integration for Apache Spark to help Apache Spark developers seamlessly build and run Apache Spark applications on Amazon Redshift data. With this release, Amazon Redshift open-sources the Amazon Redshift contributions and integration for Apache Spark, empowering Spark developers to review the source code, extend it, contribute features and/or make modifications that meet their Spark application needs.

Amazon Redshift integration for Apache Spark is a follow-up to our general availability announcement of this integration back in November 2022. In addition to the expanded pushdown functionality, this release also has support for AWS Secrets Manager integration and support for Parquet writes.

Amazon Redshift integration for Apache Spark builds on an existing open source connector project and enhances it for performance and security, helping customers gain up to 10x faster application performance. We acknowledge all the contributors to the project, some of whom collaborated with us to make this happen. As we make further enhancements to the connector, we will continue to contribute back to the open source project. To get started with using the open sourced Spark Redshift connector, go to your favorite open source Apache Spark service. From there, use data frame or Spark SQL code in an Apache Spark job or Notebook to connect to the Amazon Redshift data warehouse, and start running queries in minutes. Amazon Redshift integration for Apache Spark is available in all AWS Regions where Amazon Redshift is available. To learn more, see Amazon Redshift and Amazon Redshift Integration for Apache Spark.