Posted On: Jan 22, 2019
You can now use Python scripts in AWS Glue to run small to medium-sized generic tasks that are often part of an ETL (extract, transform, and load) workflow. Previously, AWS Glue jobs were limited to those that ran in a serverless Apache Spark environment. You can now use Python shell jobs, for example, to submit SQL queries to services such as Amazon Redshift, Amazon Athena, or Amazon EMR, or run machine-learning and scientific analyses.
Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others. You can run Python shell jobs using 1 DPU (Data Processing Unit) or 0.0625 DPU (which is 1/16 DPU). A single DPU provides processing capacity that consists of 4 vCPUs of compute and 16 GB of memory.