Short description
------------------



To filter on partitions in the AWS Glue Data Catalog, use a [pushdown predicate](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html#aws-glue-programming-etl-partitions-pushdowns). Unlike **Filter** transforms, pushdown predicates let you filter on partitions without having to list and read all the files in your dataset.



 Resolution
-----------



[Create an AWS Glue job](https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html), and then specify the pushdown predicate in the **DynamicFrame**. In the following example, the job processes data in only the **s3://awsexamplebucket/product\_category=Video** partition: 





```plaintext
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdata", table_name = "sampletable", transformation_ctx = "datasource0",push_down_predicate = "(product_category == 'Video')")
```



In the following example, the pushdown predicate filters by date. The job processes data in only the **s3://awsexamplebucket/year=2019/month=08/day=02** partition:





```plaintext
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdata", table_name = "sampletable", transformation_ctx = "datasource0",push_down_predicate = "(year == '2019' and month == '08' and day == '02')")
```



In the following example, the pushdown predicate filters by date for non-Hive style partitions. The job processes data in only the **s3://awsexamplebucket/2019/07/03** partition:





```plaintext
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdata", table_name = "sampletable", transformation_ctx = "datasource0",push_down_predicate ="(partition_0 == '2019' and partition_1 == '07' and partition_2 == '03')" )
```




---








I want to run an AWS Glue job on a specific partition in an Amazon Simple Storage Service (Amazon S3) location.

Run an AWS Glue job on a specific Amazon S3 partition

How can I run an AWS Glue job on a specific partition in Amazon S3?

Analytics

How to create a AWS Glue Connector for data sources in VMware Cloud on AWS

How can I use Hive and Spark on Amazon EMR to query an AWS Glue Data Catalog that's in a different AWS account?

How can I create and use partitioned tables in Amazon Athena?

Why doesn't my MSCK REPAIR TABLE query add partitions to the AWS Glue Data Catalog?

How do I resolve the "No space left on device" error in an AWS Glue ETL job?

Can't get Partitions to work with my Glue Data Catalog

Are partitions advantageous for groupby operations in Glue jobs?

Cannot save Pushdown Predicate for S3 source's Glue Table in Glue Job visual mode

Delete partitions in Glue Data Catalog using crawler not working.

AWS Glue partitions for data catalog table

How can I run an AWS Glue job on a specific partition in Amazon S3?

Short description

Resolution

Relevant content