AWS Machine Learning Blog

Category: Amazon SageMaker

Build accurate ML training datasets using point-in-time queries with Amazon SageMaker Feature Store and Apache Spark

This post is co-written with Raphey Holmes, Software Engineering Manager, and Jason Mackay, Principal Software Development Engineer, at GoDaddy. GoDaddy is the world’s largest services platform for entrepreneurs around the globe, empowering their worldwide community of over 20 million customers—and entrepreneurs everywhere—by giving them all the help and tools they need to grow online. GoDaddy […]

Create a large-scale video driving dataset with detailed attributes using Amazon SageMaker Ground Truth

Do you ever wonder what goes behind bringing various levels of autonomy to vehicles? What the vehicle sees (perception) and how the vehicle predicts the actions of different agents in the scene (behavior prediction) are the first two steps in autonomous systems. In order for these steps to be successful, large-scale driving datasets are key. […]

Improve newspaper digitalization efficacy with a generic document segmentation tool using Amazon Textract

We are living in a digital age. Information that used to be spread by printouts is disseminated at unforeseen speeds through digital formats. In parallel to the inventions of new types of media, an increasing number of archives and libraries are trying to create digital repositories with new technologies. Digitization allows for preservation by creating […]

Automate Amazon SageMaker Studio setup using AWS CDK

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move […]

Build patient outcome prediction applications using Amazon HealthLake and Amazon SageMaker

Healthcare data can be challenging to work with and AWS customers have been looking for solutions to solve certain business challenges with the help of data and machine learning (ML) techniques. Some of the data is structured, such as birthday, gender, and marital status, but most of the data is unstructured, such as diagnosis codes […]

Develop and deploy ML models using Amazon SageMaker Data Wrangler and Amazon SageMaker Autopilot

Data generates new value to businesses through insights and building predictive models. However, although data is plentiful, available data scientists are far and few. Despite our attempts in recent years to produce data scientists from academia and elsewhere, we still see a huge shortage that will continue into the near future. To accelerate model building, […]

Save costs by automatically shutting down idle resources within Amazon SageMaker Studio

July 2023: This post was reviewed for accuracy. The Github repository is maintained up to date. Amazon SageMaker Studio provides a unified, web-based visual interface where you can perform all machine learning (ML) development steps, making data science teams up to 10 times more productive. Studio gives you complete access, control, and visibility into each […]

Prepare data from Snowflake for machine learning with Amazon SageMaker Data Wrangler

Data preparation remains a major challenge in the machine learning (ML) space. Data scientists and engineers need to write queries and code to get data from source data stores, and then write the queries to transform this data, to create features to be used in model development and training. All of this data pipeline development […]

Unlock near 3x performance gains with XGBoost and Amazon SageMaker Neo

October 2021: This post has been updated with a new sample notebook for Amazon SageMaker Studio users.  When a model gets deployed to a production environment, inference speed matters. Models with fast inference speeds require less resources to run, which translates to cost savings, and applications that consume the models’ predictions benefit from the improved […]

Human-in-the-loop review of model explanations with Amazon SageMaker Clarify and Amazon A2I

Domain experts are increasingly using machine learning (ML) to make faster decisions that lead to better customer outcomes across industries including healthcare, financial services, and many more. ML can provide higher accuracy at lower cost, whereas expert oversight can ensure validation and continuous improvement of sensitive applications like disease diagnosis, credit risk management, and fraud […]