Posted On: Apr 1, 2022

AWS Glue DataBrew customers are now able to clean and transform data stored in the Optimized Row Columnar (ORC) file format, a widely used data format for storing Hive data. When creating a dataset in AWS Glue DataBrew, you can now use ORC files in addition to already supported Apache Parquet, Microsoft Excel, CSV, and JSON file formats.  

For a list of supported input formats, see Supported file types for data sources in the AWS Glue DataBrew Developer Guide.

Updated April 11, 2022 - This post inaccurately listed Apache Avro as a supported input format. As of this date, AWS Glue DataBrew does not currently support Apache Avro as an input format.