AWS Big Data Blog
Category: Announcements
Announcing cross-account ingestion for Amazon OpenSearch Service
Amazon OpenSearch Ingestion is a powerful data ingestion pipeline that AWS customers use for many different purposes, such as observability, analytics, and zero-ETL search. Many customers today push logs, traces, and metrics from their applications to OpenSearch Ingestion to store and analyze this data. Today, we are happy to announce that OpenSearch Ingestion pipelines now […]
Tailor Amazon SageMaker Unified Studio project environments to your needs using custom blueprints
Amazon SageMaker Unified Studio is a single data and AI development environment that brings together data preparation, analytics, machine learning (ML), and generative AI development in one place. By unifying these workflows, it saves teams from managing multiple tools and makes it straightforward for data scientists, analysts, and developers to build, train, and deploy ML […]
Amazon SageMaker introduces Amazon S3 based shared storage for enhanced project collaboration
AWS recently announced that Amazon SageMaker now offers Amazon Simple Storage Service (Amazon S3) based shared storage as the default project file storage option for new Amazon SageMaker Unified Studio projects. This feature addresses the deprecation of AWS CodeCommit while providing teams with a straightforward and consistent way to collaborate on project files across the […]
Accelerate your data and AI workflows by connecting to Amazon SageMaker Unified Studio from Visual Studio Code
In this post, we demonstrate how to connect your local VS Code to SageMaker Unified Studio so you can build complete end-to-end data and AI workflows while working in your preferred development environment.
Introducing enhanced AI assistance in Amazon SageMaker Unified Studio: Agentic chat, Amazon Q Developer CLI, and MCP integration
In this post, we will walk through how you can use the improved Amazon Q Developer chat and the new built-in Amazon Q Developer CLI in SageMaker Unified Studio for coding ETL tasks, to fix code errors, and generate ML development workflows. Both interfaces use MCP to read files, run commands, and interact with AWS services directly from the IDE. You can also configure additional MCP servers to extend Amazon Q Developer’s capabilities with custom tools and integrations specific to your workflow.
Introducing restricted classification terms for governed classification in Amazon SageMaker Catalog
Security and compliance concerns are key considerations when customers across industries rely on Amazon SageMaker Catalog. Customers use SageMaker Catalog to organize, discover, and govern data and machine learning (ML) assets. A common request from domain administrators is the ability to enforce governance controls on certain metadata terms that carry compliance or policy significance. Examples […]
Announcing SageMaker Unified Studio Workshops for Financial Services
In this post, we’re excited to announce the release of four Amazon SageMaker Unified Studio publicly available workshops that are specific to each FSI segment: insurance, banking, capital markets, and payments. These workshops can help you learn how to deploy Amazon SageMaker Unified Studio effectively for business use cases.
Accelerate AWS Glue Zero-ETL data ingestion using Salesforce Bulk API
AWS Glue Zero ETL (extract, transform, and load) now supports Salesforce Bulk API, delivering substantial performance gains compared to Salesforce REST API for large-scale data integration for targets such as Amazon SageMaker lakehouse and Amazon Redshift. In this blog post, we show you how to use Zero-ETL powered by AWS Glue with Salesforce Bulk API to accelerate your data integration processes.
Achieve full control over your data encryption using customer managed keys in Amazon Managed Service for Apache Flink
Encryption of both data at rest and in transit is a non-negotiable feature for most organizations. Furthermore, organizations operating in highly regulated and security-sensitive environments—such as those in the financial sector—often require full control over the cryptographic keys used for their workloads. Amazon Managed Service for Apache Flink makes it straightforward to process real-time data […]
The Amazon SageMaker Lakehouse Architecture now supports Tag-Based Access Control for federated catalogs
We are now announcing support for Lake Formation tag-based access control (LF-TBAC) to federated catalogs of S3 Tables, Redshift data warehouses, and federated data sources such as Amazon DynamoDB, MySQL, PostgreSQL, SQL Server, Oracle, Amazon DocumentDB, Google BigQuery, and Snowflake. In this post, we illustrate how to manage S3 Tables and Redshift tables in the lakehouse using a single fine-grained access control mechanism of LF-TBAC. We also show how to access these lakehouse tables using your choice of analytics services, such as Athena, Redshift, and Apache Spark in Amazon EMR Serverless.