Amazon SageMaker Feature Store
A fully managed repository for machine learning features
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features.
Features are the attributes or properties models use during training and inference to make predictions. For example, in a ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and how long songs were listened to. The accuracy of a ML model is based on a precise set and composition of features. Often, these features are used repeatedly by multiple teams training multiple models. And whichever feature set was used to train the model needs to be available to make real-time predictions (inference). Keeping a single source of features that is consistent and up-to-date across these different access patterns is a challenge as most organizations keep two different feature stores, one for training and one for inference.
Amazon SageMaker Feature Store is a purpose-built repository where you can store and access features so it’s much easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. SageMaker Feature Store keeps track of the metadata of stored features (e.g. feature name or version number) so that you can query the features for the right attributes in batches or in real time using Amazon Athena, an interactive query service. SageMaker Feature Store also keeps features updated, because as new data is generated during inference, the single repository is updated so new features are always available for models to use during training and inference.
Ingest data from many sources
There are many ways to ingest features into Amazon SageMaker Feature Store. You can use streaming data sources like Amazon Kinesis Data Firehose. You can also create features in data preparation tools such as Amazon SageMaker Data Wrangler, and store them directly into SageMaker Feature Store with just a few clicks.
Search and discovery
Amazon SageMaker Feature Store tags and indexes features so they are easily discoverable through a visual interface in SageMaker Studio. Browsing the feature catalog allows teams to understand features better and determine if a feature is useful for a particular model.
Ensure Feature Consistency
Amazon SageMaker Feature Store helps ensure models make accurate predictions by making the same features available for both training and for inference. Training and inference are very different use cases and the storage requirements are different for each. SageMaker Feature Store addresses both requirements. During training, models use a complete data set which often takes hours, while inference needs to happen in milliseconds and usually requires a subset of the data. For example, in a model that predicts the next best song in a playlist, you train the model on thousands of songs, but during inference, SageMaker Feature Store only accesses the last three songs to predict the next song. SageMaker Feature Store allows models to access the same set of features for training runs (which are usually done offline and in batches), and for real-time inference.
It’s common to see different definitions for similar features across a business. For example, “temperature” could be defined in Celsius or Fahrenheit or “dates” could be represented at date-month-year or month-date-year. Amazon SageMaker Feature store eliminates confusion across teams by storing features definitions in a single repository so that it’s clear how each feature is defined. Having features clearly defined makes it easier to reuse features for different applications.
Integrate with Amazon SageMaker Pipelines
Amazon SageMaker Feature Store integrates with Amazon SageMaker Pipelines to create, add feature search and discovery to, and reuse automated machine learning workflows. As a result, it’s easy to add feature search, discovery, and reuse to your ML workflow.
“At Climate, we believe in providing the world’s farmers with accurate information to make data driven decisions and maximize their return on every acre. To achieve this, we have invested in technologies such as machine learning tools to build models using measurable entities known as features, such as yield for a grower’s field. With Amazon SageMaker Feature Store, we can accelerate the development of ML models with a central feature store to access and reuse features across multiple teams easily. SageMaker Feature Store makes it easy to access features in real-time using the online store or run features on a schedule using the offline store for different use cases. With the SageMaker Feature Store, we can develop ML models faster.”
Daniel McCaffrey, Vice President, Data and Analytics, Climate
“We chose to build Intuit’s new machine learning platform on AWS in 2017, combining Amazon SageMaker’s powerful capabilities for model development, training, and hosting with Intuit’s own capabilities in orchestration and feature engineering. As a result, we cut our model development lifecycle dramatically. What used to take six full months now takes less than a week, making it possible for us to push AI capabilities into our TurboTax, QuickBooks, and Mint products at a greatly accelerated rate. We have worked closely with AWS in the lead up to the release of Amazon SageMaker Feature Store, and we are excited by the prospect of a fully managed feature store so that we no longer have to maintain multiple feature repositories across our organization. Our data scientists will be able to use existing features from a central store and drive both standardization and reuse of features across teams and models.”
Mammad Zadeh, Intuit Vice President of Engineering, Data Platform
“At Experian, we believe it is our responsibility to empower consumers to understand and use credit in their financial lives, and assist lenders in managing credit risk. As we continue to implement best practices to build our financial models, we are looking at solutions that accelerate the production of products that leverage machine learning. Amazon SageMaker Feature Store provides us with a secure way to store and reuse features for our ML applications. The ability to maintain consistency for both real-time and batch applications across multiple accounts is a key requirement for our business. Using the new capabilities of Amazon SageMaker Feature Store enables us to empower our customers to take control of their credit and reduce costs in the new economy.”
Geoff Dzhafarov, Chief Enterprise Architect, Experian Consumer Services
“At DeNA, our mission is to deliver impact and delight using the internet and AI/ML. Providing value-based services is our primary goal and we want to ensure our businesses and services are ready to achieve that goal… We would like to discover and reuse features across the organization and Amazon SageMaker Feature Store helps us with an easy and efficient way to reuse features for different applications. Amazon SageMaker Feature Store also helps us in maintaining standard feature definitions and helps us with a consistent methodology as we train models and deploy them to production. With these new capabilities of Amazon SageMaker, we can train and deploy ML models faster, keeping us on our path to delight our customers with the best services.”
Kenshin Yamada, General Manager / AI System Dept System Unit, DeNA
“A strong care industry where supply matches demand is essential for economic growth from the individual family up to the nation’s GDP. We’re excited about Amazon SageMaker Feature Store as we believe it will help us scale better across our data science and development teams, by using a consistent set of curated data. With the newly announced capabilities of Amazon SageMaker, we can accelerate development and deployment of our ML models for different applications, helping our customers make better informed decisions through faster real-time recommendations.”
Clemens Tummeltshammer, Data Science Manager, Care.com
“Using ML, 3M is improving tried-and-tested products, like sandpaper, and driving innovation in several other spaces, including healthcare. As we plan to scale machine learning to more areas of 3M, we see the amount of data and models growing rapidly – doubling every year. We are enthusiastic about the new SageMaker features because they will help us scale. Amazon SageMaker Data Wrangler makes it much easier to prepare data for model training, and Amazon SageMaker Feature Store will eliminate the need to create the same model features over and over. Finally, Amazon SageMaker Pipelines will help us automate data prep, model building, and model deployment into an end to end workflow so we can speed time to market for our models. Our researchers are looking forward to the taking advantage of the new speed of science at 3M.”
David Frazee, Technical Director at 3M Corporate Systems Research Lab
AWS Machine Learning Blog
Build accurate ML training datasets using point-in-time queries with Amazon SageMaker Feature Store and Apache Spark
AWS Machine Learning Blog
Understanding the key capabilities of Amazon SageMaker Feature Store
AWS Machine Learning Blog
Using streaming ingestion with Amazon SageMaker Feature Store to make ML-backed decisions in near-real time