Posted On: Aug 21, 2023
Amazon SageMaker Data Wrangler now supports S3 access points for previewing and importing data to SageMaker Data Wrangler and as a destination for data exported from SageMaker Data Wrangler. Preparing high quality data for ML is often complex and time consuming as it requires aggregating data across various sources and formats using different tools. With SageMaker Data Wrangler, you can explore and import data from a variety of popular sources, such as Amazon S3, Amazon Athena, Amazon Redshift, Snowflake, Databricks and 40+ SaaS data sources. Customers increasingly use Amazon S3 to store shared datasets, where data is aggregated and accessed by different applications, teams and individuals. S3 access points enable organizations to grant fine-grained access control at a granular level. Instead of modifying a single bucket policy, organizations can create multiple access points with individual policies tailored to specific use cases, reducing the risk of misconfiguration or unintended access to sensitive data. Starting today, SageMaker Data Wrangler is making it easier for customers to prepare data from shared datasets stored in S3 while enabling organizations to securely control data access in their organization.
Once the data is imported, you can quickly understand data quality, clean the data, and create features with 300+ built in analysis and data transformation. You can also train and deploy models with SageMaker Autopilot, and operationalize data preparation process in a feature engineering, training or or deployment pipelines using integration with SageMaker Pipeline from SageMaker Data Wrangler.
SageMaker Data Wrangler supports Amazon S3 Access Points in all the regions currently supported by Data Wrangler. To learn more, see the AWS technical documentation.