What is feature engineering?

Model features are the inputs that machine learning (ML) models use during training and inference to make predictions. ML model accuracy relies on a precise set and composition of features. For example, in an ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and song listening time. It can take significant engineering effort to create features. Feature engineering involves the extraction and transformation of variables from raw data, such as price lists, product descriptions, and sales volumes so that you can use features for training and prediction. The steps required to engineer features include data extraction and cleansing and then feature creation and storage.

What are the challenges of feature engineering?

Feature engineering is challenging because it involves a combination of data analysis, business domain knowledge, and some intuition. When creating features, it's tempting to go immediately to available data, but often you should start by considering which data is required by speaking with experts, brainstorming, and doing third-party research. Without going through this exercise, you could miss important predictor variables.

Data extraction

Collecting data is the process of assembling all the data you need for ML. Data collection can be tedious because data resides in many data sources, including on laptops, in data warehouses, in the cloud, inside applications, and on devices. Finding ways to connect to different data sources can be challenging. Data volumes are also increasing exponentially, so there is a lot of data to search through. Additionally, data has vastly different formats and types depending on the source. For example, video data and tabular data are not easy to use together.

Feature creation

Data labeling is the process of identifying raw data (images, text files, videos, and so on) and adding one or more meaningful and informative labels to provide context so an ML model can learn from it. For example, labels might indicate if a photo contains a bird or car, which words were mentioned in an audio recording, or if an X-ray discovered an irregularity. Data labeling is required for various use cases, including computer vision, natural language processing, and speech recognition.

Feature storage

After data is cleaned and labeled, ML teams often explore the data to make sure it is correct and ready for ML. Visualizations like histograms, scatter plots, box and whisker plots, line plots, and bar charts are all useful tools to confirm data is correct. Additionally, visualizations also help data science teams complete exploratory data analysis. This process uses visualizations to discover patterns, spot anomalies, test a hypothesis, or check assumptions. Exploratory data analysis does not require formal modeling; instead, data science teams can use visualizations to decipher the data. 

How can AWS help with feature engineering?

With Amazon SageMaker Data Wrangler, you can simplify the feature engineering process using a single visual interface. Using the SageMaker Data Wrangler data selection tool, you can choose the raw data that you want from various data sources and import it with a single click. SageMaker Data Wrangler contains over 300 built-in data transformations so that you can quickly normalize, transform, and combine features without having to write any code. After your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines and save them for reuse in the Amazon SageMaker Feature Store. SageMaker Feature Store is a purpose-built repository where you can store and access features, so it’s easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent.

Next steps on AWS

Check out additional product-related resources
AWS support for Feature Engineering 
Sign up for a free account

Instant get access to the AWS Free Tier.

Sign up 
Start building in the console

Get started building in the AWS management console.

Sign in