Sign in
Categories
Your Saved List Partners Sell in AWS Marketplace Amazon Web Services Home Help

Apache AirFlow Quickstart v1.10.7

Cloudup | 1.10.7

Linux/Unix, Amazon Linux Amazon Linux 2 2.0 LTS - 64-bit Amazon Machine Image (AMI)

Reviews from AWS Marketplace

0 AWS reviews
  • 5 star
    0
  • 4 star
    0
  • 3 star
    0
  • 2 star
    0
  • 1 star
    0

External reviews

24 reviews
from G2

External reviews are not included in the AWS star rating for the product.


    Health, Wellness and Fitness

Currently using airflow for complex ETL pipelines

  • January 13, 2019
  • Review verified by G2

What do you like best?
Easy to configure a new pipeline once you create a codebase. I currently use airflow to manage our extraction, transformation, enrichment and quality checks. Because most of our pipelines follow a similar format we can create a standard pipeline swapping out the extraction and transformation code individually.
What do you dislike?
Because of the flexibility the learning curve for developing an airflow pipeline is a bit long, using an existing pipeline is very easy however development requires knowledge of python, bash.
What problems are you solving with the product? What benefits have you realized?
We manage data ingestion and modeling for multiple products and customers within each product. Each has their own pipeline with their own code.
Recommendations to others considering the product:
Be sure you have proper python programming skills on your team before using.


    Marketing and Advertising

Flexible tool to design and execute data flows

  • November 29, 2018
  • Review provided by G2

What do you like best?
Apache Airflow is a great flexible tool that allows the data engineers/scientists to design data-intensive workflows efficiently. Airflow workflows are designed as DAGs (Directed Acyclic Graphs), each node of the graph can be anything (e.g. Python or bash script ...). Airflow manages the workflow DAGs and its scheduling and communication messages between the graph nodes efficiently. The best thing I like in Apache Airflow is that it is a simple Python tool that can design and execute very complicated workflows. In addition, Airflow has the ability to apply user-based strategies for the failed nodes in the graph. For example, the user can restart a task node if it is failed after x seconds, or send email to the administrator with the error message. Also, Apache Airflow has a user-friendly UI that allows the users to list the current running/stopped graphs and to display the logs of each task.
What do you dislike?
In rare cases, Apache Airflow UI displays a stale information about the current running DAGs. For example, sometimes it shows that a certain graph is running, however, one of its tasks is already failed. Also, sometimes it shows that a workflow it is already finished, but one of its tasks still running in the background. In these cases, the user has to restart the Airflow server and clear the stale logs.
What problems are you solving with the product? What benefits have you realized?
We used Airflow to design a very complicated data-intensive workflow that includes a lot of data pre-processing and machine learning modeling. That helped us to deliver the results of the output machine learning models on monthly basis. Also, Airflow helped us to quickly recognize if there is a problem in one of the existing data tasks (e.g. database is not reachable) and fixing it accordingly.


    Ritesh P.

A good tool for orchestrate your data workflow

  • November 23, 2018
  • Review provided by G2

What do you like best?
The best part of Airflow is that it is open source and also its power to run on distributed environment. It provides out of the box operators (connectors) to plugin almost any data source.
What do you dislike?
Language compatibility. Airflow requires one to have a knowledge of python language. Also its UI is not powerful and has glitches sometime. One has to write code to orchestrate workflows as it does not provide drag and drop feature.
What problems are you solving with the product? What benefits have you realized?
We have implemented our data workflow consist of fetching data from sources do processing and dump the final data to datamart.
Recommendations to others considering the product:
It would be better if it has a richer UI experience.
Also support for languages other than python.


    Higher Education

Easy and flexible data flow management tool

  • April 27, 2018
  • Review provided by G2

What do you like best?
Airflow is a great data flow management tool that is easy and flexible. It is Python-based tool, however, it is very well designed to manage Python and non-Python applications.
What do you dislike?
Airflow setup is not a straight-forward process, especially, on Linux environment. Many environment variables and configuration should be done in advance to have it up and running.
Another thing that, in some situations, the workflow hangs and the status is stalled. For example, you can see on the web-UI that a certain workflow is running, however, it is already crashed/stopped. The vice-versa is true.
What problems are you solving with the product? What benefits have you realized?
Airflow helped us to design, manage and monitor complex data flows starting from collecting the data from different sources, cleaning the data and applying machine learning algorithms to produce the final models.