AWS Machine Learning Blog
Category: Amazon SageMaker
Artificial intelligence and machine learning continues at AWS re:Invent
A fresh new year is here, and we wish you all a wonderful 2021. We signed off last year at AWS re:Invent on the artificial intelligence (AI) and machine learning (ML) track with the first ever machine learning keynote and over 50 AI/ML focused technical sessions covering industries, use cases, applications, and more. You can […]
Accelerating MLOps at Bayer Crop Science with Kubeflow Pipelines and Amazon SageMaker
This is a guest post by the data science team at Bayer Crop Science. Farmers have always collected and evaluated a large amount of data with each growing season: seeds planted, crop protection inputs applied, crops harvested, and much more. The rise of data science and digital technologies provides farmers with a wealth of new […]
Implementing a custom labeling GUI with built-in processing logic with Amazon SageMaker Ground Truth
Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. It offers easy access to Amazon Mechanical Turk and private human labelers, and provides them with built-in workflows and interfaces for common labeling tasks. A labeling team may wish to use […]
Extracting buildings and roads from AWS Open Data using Amazon SageMaker
Sharing data and computing in the cloud allows data users to focus on data analysis rather than data access. Open Data on AWS helps you discover and share public open datasets in the cloud. The Registry of Open Data on AWS hosts a large amount of public open data. The datasets range from genomics to climate to transportation […]
Control and audit data exploration activities with Amazon SageMaker Studio and AWS Lake Formation
May 2024: This post was reviewed and updated to use a new dataset, reflect the updated Studio experience and AWS IAM Identity Center. Certain industries are required to audit all access to their data. This includes auditing exploratory activities performed by data scientists, who usually query data from within machine learning (ML) notebooks. This post […]
Monitoring in-production ML models at large scale using Amazon SageMaker Model Monitor
Machine learning (ML) models are impacting business decisions of organizations around the globe, from retail and financial services to autonomous vehicles and space exploration. For these organizations, training and deploying ML models into production is only one step towards achieving business goals. Model performance may degrade over time for several reasons, such as changing consumer […]
Training a reinforcement learning Agent with Unity and Amazon SageMaker RL
Unity is one of the most popular game engines that has been adopted not only for video game development but also by industries such as film and automotive. Unity offers tools to create virtual simulated environments with customizable physics, landscapes, and characters. The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables […]
Exploratory data analysis, feature engineering, and operationalizing your data flow into your ML pipeline with Amazon SageMaker Data Wrangler
According to The State of Data Science 2020 survey, data management, exploratory data analysis (EDA), feature selection, and feature engineering accounts for more than 66% of a data scientist’s time (see the following diagram). The same survey highlights that the top three biggest roadblocks to deploying a model in production are managing dependencies and environments, […]
Identifying training bottlenecks and system resource under-utilization with Amazon SageMaker Debugger
At AWS re:Invent 2020, AWS released the profiling functionality for Amazon SageMaker Debugger. In this post, we expand on the importance of profiling deep neural network (DNN) training, review some of the common performance bottlenecks you might encounter, and demonstrate how to use the profiling feature in Debugger to detect such bottlenecks. In the context […]
Using streaming ingestion with Amazon SageMaker Feature Store to make ML-backed decisions in near-real time
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Businesses are increasingly using machine learning (ML) to make near-real time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing […]