AWS Glue Data Integration Engines

Choose the right data integration engine for your user skillsets and analytics workloads

AWS Glue is a serverless data integration service that offers multiple data integration engines to support your users and workloads. With AWS Glue, you can use the appropriate engine for any workload, based on the characteristics of your workload and the preferences of your developers and analysts.

Introducing AWS Glue for Ray

Key features

AWS Glue for Apache Spark

AWS Glue provides a performance-optimized, serverless infrastructure for running Apache Spark for data integration and extract, transform, and load (ETL) jobs. AWS Glue for Apache Spark supports batch and stream processing and speeds up data ingestion, processing, and integration. You can then create and update your data lake and data warehouse and more quickly extract insights from data.

AWS Glue for Ray

With AWS Glue for Ray, your data engineers and developers can process large datasets using Python and popular Python libraries. AWS Glue uses Ray (Ray.io), an open-source unified compute framework used to scale Python workloads. AWS Glue for Ray includes popular Python data processing libraries, so you can bring your own libraries to customize your data integration job.

AWS Glue for Python Shell

With AWS Glue for Python Shell, you can use a Python Shell job to run Python scripts on AWS Glue. Through these jobs, you can write complex data integration and analytics jobs in Python. AWS Glue for Python Shell jobs offer common analytics libraries out of the box, including Pandas, NumPy, and Amazon SageMaker Data Wrangler. You can use the bundled functionality to connect to various databases, data warehouses, and AWS services.