SageMaker Canvas unlocks no-code ML and data preparation at petabyte-scale

Posted on: Aug 16, 2024

Amazon SageMaker Canvas now empowers enterprises to harness the full potential of their data by enabling support of petabyte-scale datasets. Starting today, you can interactively prepare large datasets, create end-to-end data flows and trigger AutoML experiments on petabytes – a substantial leap from the previous 5GB limit. With 50+ connectors, intuitive "chat with data" interface, and petabyte support, Canvas provides a scalable, low-code/no-code ML solution for handling real-world, enterprise use cases.

Starting today, Canvas empowers you with new sampling techniques like random and stratified, allowing samples up to 200K rows – a tenfold increase. This makes it easy to gather data quality insights and understand the impact of your data transformations interactively before processing your entire dataset, leveraging our new seamless integration with EMR Serverless. Canvas automatically scales processing over 5GB data across sampling, preparation, model building and inference to EMR Serverless, unlocking your data's full predictive potential through an intuitive experience. EMR Serverless usage incurs additional EMR pricing costs.

The new petabyte support and improved interactive experience is available across all AWS Regions where SageMaker Canvas is offered.

To get started with no-code ML and data preparation of large datasets, enable "large data processing configuration" in your Canvas domain and user profile using our technical documentation, and learn how to use the new capability from the AWS Machine Learning blog. Existing users should update their SageMaker domain settings per the documentation, log out from the Canvas workspace, and log back in to access the latest version.