Posted On: Nov 30, 2022
Today, AWS announces the general availability of Amazon SageMaker Data Wrangler support for over 40 third party applications as data sources for machine learning (ML) through the integration with Amazon AppFlow. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. Preparing high quality data for ML is often complex and time consuming as it requires aggregating data across various sources and formats using different tools. With SageMaker Data Wrangler, you can explore and import data from a variety of popular sources, such as Amazon S3, Amazon Athena, Amazon Redshift, Snowflake, Databricks and Salesforce Customer Data Platform. Starting today, we are making it easier for customers to aggregate data for ML from over 40 third-party application data sources, including Salesforce Marketing, SAP, Google Analytics, LinkedIn and more via Amazon AppFlow.
Amazon AppFlow is a fully managed service that enables customers to securely transfer data from third-party applications to AWS services such as Amazon S3, and catalog the data in the AWS Glue Data Catalog in just a few clicks. Once the data sources are set up in AppFlow, you can browse tables and schemas from these data sources using Data Wrangler SQL explorer. You can write Athena queries to preview data to ensure that it is relevant for your use cases, and import data to prepare for ML model training. You can also join data from multiple sources after import to create the right data set for ML. Once the data is imported, you can quickly understand data quality, clean the data, and create features with 300+ built in analysis and data transformation. You can also train and deploy model with SageMaker Autopilot, and operationalize data preparation process in a feature engineering, training or or deployment pipeline using integration with SageMaker Pipeline from Data Wrangler.
Data Wrangler supports 40+ third-party data sources in all the regions currently supported by AppFlow. This feature is available at no additional charge beside Data Wrangler and AppFlow cost.
To get started, see the following resources:
- New — Amazon SageMaker Data Wrangler supports SaaS applications as data sources
- Import data from third-party applications in the AWS technical documentation