Posted On: Jun 30, 2023

Amazon SageMaker Canvas now supports the Apache Parquet file format, enabling additional file formats for tabular, time-series forecast, and NLP datasets. SageMaker Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own — without requiring any machine learning experience or having to write a single line of code.

Starting today, Canvas supports Apache Parquet - an open-source, column-oriented data file format designed for efficient data storage and retrieval. With this new capability, you can import data using Parquet file format in addition to CSV files for tabular, time-series forecast, and NLP use cases, giving you greater flexibility. While creating a dataset in Canvas, you can choose multiple Parquet files from your local disk or your Amazon S3 bucket. Each Parquet file can be up to 5GB in size. With efficient compression and encoding schemes, Parquet files maximize the effectiveness of data usage in Canvas to import data, build ML models, and generate predictions.

Support for Apache Parquet is available in all AWS regions where SageMaker Canvas is supported today. To learn more, see the product documentation.