AWS Big Data Blog
The next generation of Amazon SageMaker: The center for all your data, analytics, and AI
This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker, the center for all of your data, analytics, and AI.
The relationship between analytics and AI is rapidly evolving. Our customers are telling us that they are seeing their analytics and AI workloads increasingly converge around a lot of the same data, and this is changing how they are using analytics tools with their data. They aren’t using analytics and AI tools in isolation. They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications.
We want to make it streamlined for our customers to work with their data, whether for analytics or AI, help them get to AI-ready data faster, and improve productivity of all data and AI workers. The next generation of SageMaker is set to do just that.
Introducing the next generation of SageMaker
The rise of generative AI is changing how data and AI teams work together. For example, when a retail data analyst creates customer segmentation reports, those same datasets are now being used by AI teams to train recommendation engines. Or customer service teams analyzing call logs to track common issues are now using that data to train AI chatbots to handle routine inquiries. Our customers tell us that they need tools that help data and AI teams collaborate seamlessly, but they face real challenges: data is siloed and scattered across systems, they have to build and maintain complex data pipelines, and teams struggle to access and use data efficiently due to inconsistent access controls. Customers also need to make sure that their data practices remain secure, reliable, and compliant with regulations. They need data that’s not just accessible, but also trustworthy and properly governed to keep up with growing business demands and AI opportunities.
The next generation of SageMaker, an integrated experience for data, analytics, and AI, addresses these challenges and more. SageMaker brings together widely adopted AWS ML and analytics capabilities—virtually all of the components you need for data exploration, preparation, and integration; petabyte-scale big data processing; fast SQL analytics; model development and training; governance; and generative AI development. SageMaker helps you work faster and smarter with your data and build powerful analytics and AI solutions that are deeply rooted in your unique data assets, giving you an edge over the competition.
Unified tools: Collaborate and build faster with one data and AI development environment
The rapid evolution of data and AI roles demands a revolution in the services and tools that power your work, driving a need for collaboration and teamwork across your entire organization. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI. Collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics with Amazon Q Developer, the most capable generative AI assistant for software development, helping you along the way. All your favorite functionality and tools, like standalone studios, query editors, and visual tools, are now available in one place, helping you discover and prepare data with ease, author queries or code, and get to insights faster.
SageMaker also comes with built-in generative AI powered by Amazon Q Developer that guides you along the way of your data and AI journey, transforming complex tasks into intuitive conversations. Ask questions in plain English to find the right datasets, automatically generate SQL queries, or create data pipelines without writing code. This isn’t just about making data management effortless—it’s about using AI to make your data work harder for you, unlocking insights that might otherwise remain hidden, and enabling everyone in your organization to work with data confidently, regardless of their technical expertise.
SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth, experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more. Moving forward, we’ll refer to this set of AI/ML capabilities as SageMaker AI, and we’ll continue to innovate and expand on them to make sure the new SageMaker remains the premier center for building, training, and deploying AI models. With improved access and collaboration, you’ll be able to create and securely share analytics and AI artifacts and bring data and AI products to market faster.
Unified data: Reduce data silos with an open lakehouse to unify all your data
We see organizations embarking on digital transformations and needing to quickly adapt to ever-evolving customer demands. In doing so, a unified view across all their data is required—one that breaks down data silos and simplifies data usage for teams, without sacrificing the depth and breadth of capabilities that make AWS tools unbelievably valuable. This balance between unification and maintaining advanced capabilities is key to supporting our customers’ ongoing innovation and adaptability in a rapidly changing technological landscape.
Amazon SageMaker Lakehouse, now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses. SageMaker Lakehouse enables seamless data access directly in the new SageMaker Unified Studio and provides the flexibility to access and query your data with all Apache Iceberg-compatible tools on a single copy of analytics data. With this launch, you can query data regardless of where it is stored with support for a wide range of use cases, including analytics, ad-hoc querying, data science, machine learning, and generative AI. You’ll get a single unified view of all your data for your data and AI workers, regardless of where the data sits, breaking down your data siloes. We’ve simplified data architectures, saving you time and costs on unnecessary data movement, data duplication, and custom solutions.
Additionally, we are advancing towards a zero-ETL future by expanding integrations that make data from multiple operational, transactional, and application sources available in SageMaker Lakehouse and Amazon Redshift. Zero-ETL integrations simplify data movement and ingestion, enabling increased agility, reduced costs, and minimized operational overhead while providing near real-time insights for AI and ML initiatives. All the existing Amazon Redshift zero-ETL integrations are seamlessly available within SageMaker—you can move transactional data from databases like Amazon Aurora, Amazon Relational Database Service (Amazon RDS), and Amazon DynamoDB into Amazon Redshift without performance impact and ingest high-volume real-time data from Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK) with native streaming services integrations. We announced SageMaker Lakehouse and Amazon Redshift support for zero-ETL integrations from eight applications, including Salesforce, Zendesk, ServiceNow, Zoho CRM, Salesforce Pardot, SAP, Facebook Ads, and Instagram Ads. This new capability streamlines data replication and ingestion into a unified process, minimizing the need for custom data replication pipelines. With automatic pipeline maintenance, the solution minimizes the complexity of building in-house connectors, reduces implementation and operational costs, and accelerates insights by unifying data from diverse applications.
“We have spent the last 18 months working with AWS to transform our data foundation to use best-in-class solutions that are cost-effective as well. With advancements like SageMaker Unified Studio and SageMaker Lakehouse, we expect to accelerate our velocity of delivery through seamless access to data and services, thus enabling our engineers, analysts, and scientists to surface insights that provide material value to our business.”
– Lee Slezak, SVP of Data and Analytic, Lennar
Unified governance: Meet your enterprise security needs with built-in data and AI governance
When it comes to data and AI governance, discipline equals freedom. The right governance practices can enable your teams to move faster. Data teams struggle to find a unified approach that enables effortless discovery, understanding, and assurance of data quality and security across various sources. Our customers tell us that the fragmented nature of permissions and access controls, managed separately within individual data sources and tools, leads to inconsistent implementation and potential security risks.
SageMaker simplifies the discovery, governance, and collaboration for data and AI across your lakehouse, AI models, and applications. With Amazon SageMaker Catalog, built on Amazon DataZone, you can define and enforce access policies consistently using a single permission model with fine-grained access controls. This unified catalog enables engineers, data scientists, and analysts to securely discover and access approved data and models using semantic search with generative AI-created metadata. Collaboration is seamless, with straightforward publishing and subscribing workflows, fostering a more connected and efficient work environment.
Having confidence in your data is key. SageMaker Catalog provides comprehensive data quality capabilities, including data profiling, data quality recommendations, monitoring of data quality rules, and alerts. By combining rule-based and ML approaches, we help you reconcile entities and deliver high-quality data, giving you the tools to make confident business decisions. You’ll have trust in your data, with real-time visibility of data quality and data and ML lineage, allowing you to resolve hard-to-find quality challenges. Automate data profiling and data quality recommendations, monitor data quality rules, and receive alerts. Resolve hard-to-find data quality challenges by using rule-based and ML approaches to reconcile entities, enabling you to deliver high-quality data to make confident business decisions.
Beyond discovery and collaboration, SageMaker takes AI governance to the next level by providing robust safeguards and tools to develop responsible AI policies. This holistic approach not only streamlines operations, but also builds and maintains trust throughout the organization, setting a new standard for responsible and efficient AI development and deployment.
Innovate faster with the convergence of data, analytics and AI
The next generation of SageMaker delivers an integrated experience to access, govern, and act on all your data by bringing together widely adopted AWS data, analytics, and AI capabilities. Collaborate and build faster from a unified studio using familiar AWS tools for model development, generative AI, data processing, and SQL analytics, with Amazon Q Developer assisting you along the way. Access all your data, whether it’s stored in data lakes, data warehouses, or third-party or federated data sources. And move with confidence and trust with built-in governance to address enterprise security needs. The tools to transform your business are here. We’re excited to see what you’ll build next!
To learn more, check out the following AWS News blog announcements:
About the authors
G2 Krishnamoorthy is VP of Analytics, leading AWS data lake services, data integration, Amazon OpenSearch Service, and Amazon QuickSight. Prior to his current role, G2 built and ran the Analytics and ML Platform at Facebook/Meta, and built various parts of the SQL Server database, Azure Analytics, and Azure ML at Microsoft.
Rahul Pathak is VP of Relational Database Engines, leading Amazon Aurora, Amazon Redshift, and Amazon QLDB. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio. He has co-founded two companies, one focused on digital media analytics and the other on IP-geolocation.