Posted On: Jun 5, 2023

AWS Glue for Ray, a data integration engine option on AWS Glue, is now generally available. AWS Glue for Ray helps data engineers and ETL (extract, transform, and load) developers scale their Python jobs. AWS Glue is a serverless, scalable data integration service used to discover, prepare, move, and integrate data from multiple sources. AWS Glue for Ray combines that serverless capability for data integration with Ray (ray.io), a popular new open-source compute framework that helps you scale Python workloads.

Similar to Apache Spark and Python engines on AWS Glue, you only pay for the resources that you use while running code, and you don’t need to configure or tune the resources. AWS Glue for Ray facilitates the distributed processing of your Python code over multi-node clusters. You can create and run Ray jobs anywhere that you can run AWS Glue ETL jobs. This includes existing AWS Glue jobs, command line interfaces (CLIs), and APIs. You can select the AWS Glue for Ray engine locally or through notebooks on AWS Glue Studio and Amazon SageMaker Studio Notebook. When the Ray job is ready, you can run it on demand or on a schedule.

AWS Glue for Ray is generally available in the following AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland).