Posted On: Nov 29, 2023

Today, we’re excited to announce the preview of a new smart sifting capability of Amazon SageMaker that automatically inspects and evaluates training data on-the-fly to selectively learn from only the most informative data samples, reducing model training time and cost by up to 35%. You can get started with smart data sifting in minutes without making changes to your existing data pipelines or training scripts.

Smart sifting uses your live model during training to analyze incoming data samples. It then automatically discards the samples with low loss that won’t improve the model’s learning process. By selectively using only the most informative data samples, smart sifting reduces the time and cost for training deep learning models. Customers training deep learning models with PyTorch on accelerated GPU instances in SageMaker can reduce training time by up to 35%. Because the excluded samples are relatively low loss, there is minimal or no impact to the accuracy of your trained model. To get started with smart sifting, see our associated documentation.