SageMaker Batch Transform now enables associating prediction results with input attributes

Posted on: Jul 19, 2019

Amazon SageMaker Batch Transform enables you to run predictions on datasets stored in Amazon S3. It is ideal for scenarios where you are working with large batches of data and don’t need sub-second latency. You can now configure your Batch Transform Jobs to exclude certain data attributes from prediction requests, and to join some or all of the input data attributes with prediction results. As a result, you no longer need additional pre-processing or post-processing when running batch predictions on data that is in CSV or JSON format.

For example, consider a dataset that includes three attributes: ID, age, and height. The ID attribute is a randomly generated or sequential number that carries no signal for the ML problem and was not used when training the ML model. You can now configure your Batch Transform jobs to exclude the ID attribute from each record, and pass only the age and height attributes in the prediction requests sent to the model. You can also configure your Batch Transform jobs to associate the ID attribute with the prediction results in the final S3 output of the job. Retaining record-level attributes in this way can be useful for analyzing the prediction results.

This new capability is available in all regions where Amazon SageMaker is available. To learn more about this feature, see the Amazon SageMaker Developer Guide