AWS Big Data Blog

Category: Application Integration

Manage and process your big data workflows with Amazon MWAA and Amazon EMR on Amazon EKS

Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]

Orchestrate AWS Glue DataBrew jobs using Amazon Managed Workflows for Apache Airflow

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]

Easily ingest and analyze Google Analytics data with Upsolver and Amazon AppFlow

This post is co-written by Mei Long at Upsolver.  Software as a service (SaaS) based applications are in demand today, and customers have growing need for adopting many of them in their use cases. As adoption grows, extracting data within these various SaaS applications and running analytics across them gets complicated. Although there are several […]

The following diagram shows the flow of our solution.

Integrating Datadog data with AWS using Amazon AppFlow for intelligent monitoring

Infrastructure and operation teams are often challenged with getting a full view into their IT environments to do monitoring and troubleshooting. New monitoring technologies are needed to provide an integrated view of all components of an IT infrastructure and application system. Datadog provides intelligent application and service monitoring by bringing together data from servers, databases, […]

The following diagram illustrates the workflow.

Orchestrating analytics jobs on Amazon EMR Notebooks using Amazon MWAA

In a previous post, we introduced the Amazon EMR notebook APIs, which allow you to programmatically run a notebook on both Amazon EMR Notebooks and Amazon EMR Studio (preview) without accessing the AWS web console. With the APIs, you can schedule running EMR notebooks with cron scripts, chain multiple EMR notebooks, and use orchestration services […]

The state machine transforms data using AWS Glue.

Building complex workflows with Amazon MWAA, AWS Step Functions, AWS Glue, and Amazon EMR

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that makes it easy to run open-source versions of Apache Airflow on AWS and build workflows to run your extract, transform, and load (ETL) jobs and data pipelines. You can use AWS Step Functions as a serverless function orchestrator to build scalable […]

Analyzing Google Analytics data with Amazon AppFlow and Amazon Athena

This post demonstrates how you can transfer Google Analytics data to Amazon S3 using Amazon AppFlow, and analyze it with Amazon Athena. You no longer need to build your own application to extract data from Google Analytics and other SaaS applications. Amazon AppFlow enables you to develop a fully automated data transfer and transformation workflow and an integrated query environment in one place.