Posted On: Mar 5, 2021
AWS Glue DataBrew adds four new visual transformations - Binning, Skewness, Binarization, and Transpose helping data analysts and data scientists leverage these transformations without writing any code.
Binning is a data pre-processing technique used to reduce the effects of minor observation errors and the binning transformation allows you to group numbers of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together).
Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. With the skewness transformation, you can change the distribution shape and skew of the data.
Binarization is the process of dividing data into two groups and assigning one out of two values to all the members of the same group. You can use the Binarize transformation by defining a threshold t and assigning the value 0 to all the data points below the threshold and 1 to those above it. In a simple example, transforming an image’s gray-scale from the 0-255 spectrum to a 0-1 spectrum is binarization. This makes classifier algorithms more efficient in machine learning.
Transpose lets you rotate the data from columns to rows, or vice versa. With the transpose transformation in DataBrew, you can create cleaner visualizations by rotating the columns and rows.
AWS Glue DataBrew is visual data preparation tool that makes it easy to clean and normalize data using 250+ pre-built transformations for data preparation, without the need to write any code. To get started, visit the AWS Management Console or install the DataBrew plugin in your Notebook environment and refer to the DataBrew documentation.