AWS Machine Learning Blog

Category: Amazon SageMaker

Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform

This is the first post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS to build a scalable, secure, and sustainable machine learning operations (MLOps) platform. This initial post provides an overview of the AWS and NatWest Group joint team implemented Amazon SageMaker Studio as the standard for […]

Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler

Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for machine learning (ML) applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. Today, […]

Figure 1: Cost comparison for different Hugging Face models on SageMaker Serverless Inference vs. real-time inference

Host Hugging Face transformer models using Amazon SageMaker Serverless Inference

The last few years have seen rapid growth in the field of natural language processing (NLP) using transformer deep learning architectures. With its Transformers open-source library and machine learning (ML) platform, Hugging Face makes transfer learning and the latest transformer models accessible to the global AI community. This can reduce the time needed for data […]

Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances

AWS CodeArtifact allows developers to connect internal code repositories to upstream code repositories like Pypi, Maven, or NPM. AWS CodeArtifact is a powerful addition to CI/CD workflows on AWS, but it is similarly effective for code-bases hosted on a Jupyter notebook. This is a common development paradigm for Machine Learning developers that build and train […]

Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker

Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech […]

Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation

This post was last reviewed and updated June, 2022 with revised feature groups (tables) and features (columns) permissions. You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can […]

Customize the Amazon SageMaker XGBoost algorithm container

The built-in Amazon SageMaker XGBoost algorithm provides a managed container to run the popular XGBoost machine learning (ML) framework, with added convenience of supporting advanced training or inference features like distributed training, dataset sharding for large-scale datasets, A/B model testing, or multi-model inference endpoints. You can also extend this powerful algorithm to accommodate different requirements. […]

Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger

Research over the past few years has shown that machine learning (ML) models are vulnerable to adversarial inputs, where an adversary can craft inputs to strategically alter the model’s output (in image classification, speech recognition, or fraud detection). For example, imagine you have deployed a model that identifies your employees based on images of their […]

Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow

As more organizations move to machine learning (ML) to drive deeper insights, two key stumbling blocks they run into are labeling and lifecycle management. Labeling is the identification of data and adding labels to provide context so an ML model can learn from it. Labels might indicate a phrase in an audio file, a car […]

Prepare data from Databricks for machine learning using Amazon SageMaker Data Wrangler

Data science and data engineering teams spend a significant portion of their time in the data preparation phase of a machine learning (ML) lifecycle performing data selection, cleaning, and transformation steps. It’s a necessary and important step of any ML workflow in order to generate meaningful insights and predictions, because bad or low-quality data greatly […]