Posted On: Nov 7, 2022

Amazon SageMaker Canvas announces support for correlation matrices for advanced data analysis, thereby expanding capabilities to get insights from your data prior to building ML models. SageMaker Canvas is a visual point-and-click interface that enables business analysts to generate accurate ML predictions on their own — without requiring any machine learning experience or having to write a single line of code.  

SageMaker Canvas provides capabilities to analyze and explore your data such as the ability to impute missing values and outliers with standard or custom values, using mathematical functions and operators to define and create new features, and visual exploration of data through box plots, bar graphs, and scatter plots. Starting today, SageMaker Canvas supports correlation matrices allowing you to summarize a dataset into a matrix that shows correlations between two or more values and how they relate to one another. This helps you identify and visualize patterns in a given dataset for advanced analysis.

You can now generate correlation matrices for numerical, categorical, and a combination of both variables. Datasets can be analyzed using Pearson or Spearman correlations for numerical values, or Mutual Information for categorical values, giving you choice and flexibility. The output from these matrices can be used to impute missing data, assign weights to values to understand variance, and other advanced analysis. Correlation matrices are applicable for many use cases such as analyzing price variance based on supply and demand, forecasting the amount of rain based on weather patterns, and understanding the propensity to buy based on new capabilities of a product or service.

Advanced data analysis using correlation matrices is now available in all AWS regions where SageMaker Canvas is supported. To learn more about SageMaker Canvas and get started, please see the product page and the FAQ page.