Posted On: Jul 25, 2023

Amazon SageMaker Canvas now supports five new data transforms, enabling you to better prepare and analyze your data before building machine learning (ML) models. Data is the foundation of machine learning and transforming raw data to make it suitable for ML model building and generating predictions is key to better insights. Starting today, SageMaker Canvas allows you to change the type of data in your columns between numeric, text, and datetime, while also displaying the associated feature for that data type such as binary and categorical. This gives you the flexibility to manually change the type of data in your columns based on the features. The ability to choose the right data type ensures data integrity and accuracy prior to building ML models. As an example, using a datetime data type ensures only valid dates are stored in that particular column. 

Additionally, Canvas enables you to re-sample time-series data, establishing regular intervals for the observations in your time-series dataset. This is particularly useful when time-series data contains irregularly spaced observations. Resampling of this data will help you space it equally between regular time intervals, making it useful for downstream operations such as analytics and predictions. Finally, Canvas now provides better ways to manage the rows in your data, allowing you to sort them in ascending or descending order, randomly shuffle rows, and drop duplicate rows.

These new data transformation capabilities are available in all AWS regions where Canvas is supported today. To learn more, see the product documentation.