
Overview
Apache NiFi is a visually programmed software tool that automates the movement and transformation of data between systems. It enables you to easily capture, move, enrich and transform machine data, Internet of Things (IoT data) and streaming data between systems. Its drag and drop interface enables you to build data pipelines from commercial data feeds, manufacturing equipment, IoT sensors, web servers, and business reporting and moves the data into a variety of systems such as S3, EMR, SQL databases, DynamoDB, Couchbase, MongoDB, HBase, ElasticSearch, HIVE, Kinesis, Postgres MySQL, FTP Servers + even tools such as Snowflake or BigQuery.
Calculated Systems Apache NiFi in the Cloud is a one-click deployment that automatically launches NiFi in AWS quickly and securely without any coding or complex configuration. This out-of-the-box, optimized deployment of Apache NiFi helps protect you from common pitfalls associated with open source software such as Java virtual machine (JVM) issues and logging configuration by taking care of all initialization, configuration and perimeter security needed. No need to become an expert in big data cloud architecture to migrate or manage your data.
To learn more about Apache NiFi download our free ebook: Apache NiFI for Dummies: https://www.calculatedsystems.com/nifi-for-dummies authored by several members of the Calculated Systems Team.
Highlights
- A visually programmed software tool that moves machine data/IoT data into the cloud
- Drag and drop software to move your data into S3, EMR, SQL databases, DynamoDB, ElasticSearch, Kinesis, FTP Servers, Snowflake and more.
- Easy, one-click installation for a fully functional Apache NiFI instance in AWS in minutes.
Details
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Cost/hour |
---|---|
m4.xlarge Recommended | $0.16 |
t2.2xlarge | $0.32 |
r4.xlarge | $0.16 |
t2.large | $0.08 |
r4.2xlarge | $0.32 |
m4.2xlarge | $0.32 |
m4.large | $0.08 |
t2.xlarge | $0.16 |
Vendor refund policy
For refund information please read the eula or contact Info@calculatedsystems.com
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
- This is the first release of Nifi 2.x
- Fixed UI Elements
- Migrated to Ubuntu
Additional details
Usage instructions
A detailed launch guide can be located here - https://www.calculatedsystems.com/getting-started-aws
For NiFi specific usage, outside of how to start the AMI please see our ebook Apache NiFi for Dummies - https://www.calculatedsystems.com/nifi-for-dummies
Support
Vendor support
Migration, Implementation, and Support Services Are Available. Info@calculatedsystems.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Standard contract
Customer reviews
Visual workflow offers clarity and boosts data pipeline construction
What is our primary use case?
How has it helped my organization?
What is most valuable?
What needs improvement?
For how long have I used the solution?
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
How are customer service and support?
How would you rate customer service and support?
Negative
Which solution did I use previously and why did I switch?
What about the implementation team?
What's my experience with pricing, setup cost, and licensing?
Which other solutions did I evaluate?
What other advice do I have?
Which deployment model are you using for this solution?
The tool enables effective data transformation and integration
What is our primary use case?
I use NiFi as a tool for ETL, which stands for extract, transform, and load. It is particularly effective for integration methodologies.
The tool is useful for designing ETL pipelines and is an open-source product. Data is often stored in different forms and locations. If I want to integrate and transform it, NiFi can help load data from one place to another while making transformations.
I can handle stream or batch data and identify various data types on different platforms. NiFi can integrate with tools like Slack and perform required transformations before loading to the desired downstream.
It is primarily a pipeline-building tool with a graphical UI, however, I can also write custom JARs for specific functions. NiFi is an open-source tool effective for data migration and transformations, helping improve data quality from various sources.
What is most valuable?
NiFi works on data and file levels, streamlining real-time data processes. It is highly effective for handling real-time data by working with APIs for immediate and continuous data extraction. For real-time data tasks, this front-end UI-based tool is superior to back-end platforms.
What needs improvement?
There are some areas for improvement, particularly with record-level tasks that take a bit of time. The quality of JSON data processing could be improved, as JSON workloads require manual conversions without a specific process.
Enhancing features related to alerting would be helpful, including mobile alerts for pipeline issues. Integration with mobile devices for error alerts would simplify information delivery.
What do I think about the stability of the solution?
The product is stable for simple tasks, like using databases that are not distributed. However, for distributed environments like Hadoop or HBase , some vulnerabilities exist. While these are not major issues, they should not be ignored.
What do I think about the scalability of the solution?
Scaling works well, allowing cluster expansion. However, I have never encountered very large clusters, so it's uncertain how well it supports extensive scaling.
How was the initial setup?
The initial setup is fast, especially for communication stabilization. Although the product is open source, it functions as a cluster. For single-node environments, installation is simple. For company-wide or enterprise-level clusters, the initial stages may present issues with authentication and access. Stabilization, such as port communication, may not be immediately effective.
What other advice do I have?
I recommend the product for its data privacy features. It allows secure data handling because the data is stored on my nodes. However, a skilled technician is necessary due to the reliance on Java, especially for back-end operations and error debugging.
Enterprise versions may offer easier troubleshooting. As an open-source solution, good support is crucial.
I rate the overall product as eight out of ten.
Which deployment model are you using for this solution?
Useful to transfer data from one service to another and is user-friendly
What is our primary use case?
We use the tool to transfer data from one service to another. It helps us to migrate data from one department to another.
What is most valuable?
Apache NiFi is user-friendly. Its most valuable features for handling large volumes of data include its multitude of integrated endpoints and clients and the ability to create cron jobs to run tasks at regular intervals.
What needs improvement?
The tool should incorporate more tutorials for advanced use cases. It has tutorials for simple use cases.
What do I think about the stability of the solution?
I rate the tool's stability an eight out of ten.
How are customer service and support?
I have relied on the documentation available on Apache NiFi's website for support.
How was the initial setup?
I tried to install the tool on my work laptop, and while it worked initially, it started to run slowly after some time. The department that handles the company's databases uses Apache NiFi on proper servers. I tried using it on my laptop to see if it worked, but it ran very slowly and consumed many resources from my machine.
What's my experience with pricing, setup cost, and licensing?
I used the tool's free version.
What other advice do I have?
I rate Apache NiFi an eight out of ten.
Good monitoring, metrics capabilities and provides ability to design processors with a single click
What is our primary use case?
As a DevOps engineer, my day-to-day task is to move files from one location to another, doing some transformation along the way. For example, I might pull messages from Kafka and put them into S3 buckets. Or I might move data from a GCS bucket to another location.
NiFi is really good for this because it has very good monitoring and metrics capabilities. When I design a pipeline in NiFi, I can see how much data is being processed, where it is at each stage, and what the total throughput is.
I can see all the metrics related to the complete pipeline. So, I personally like it very much.
What is most valuable?
The good thing about Apache NiFi is that it has a concept called a flow file, and there's something called a flow file processor. The processor is the building block of your entire job. They have close to 500 processors for each purpose.
For example, for reading from Kafka, Ni-Fi has a processor called "consumer Kafka". To write to S3, they have a processor called "put S3". Now, if I read from Kafka and write my own application, I'd need to ensure the library I'm using tracks my messages. I'd also need to handle any failures by rereading messages and ensuring acknowledgment. But all this complexity is already handled by Apache processor.
They have around 500 processors, with a community investing significant effort into developing them. I can design your processor with a single click, export the entire workflow, and import it. The format is actionable, so NiFi is immediately set up.
It's also distributed in nature so that I can scale it across nodes based on the workload. These nodes share their state. If one node goes down during processing, that data might be lost, but any subsequent data is safe. Such occurrences are rare.
In essence, if you want a quick solution, Apache NiFi is a strong contender. There are other solutions like AirFlow and some paid pipeline options.
AirFlow is open-source but can be complicated. For ETL or ERT solutions, there are pricier options. But if I need a pipeline that I can monitor step by step, Apache NiFi is a good choice. It integrates with Prometheus metrics, allowing me to embed them in my workflow.
There's also a processor for integration with Slack, and I can receive notifications when the workflow is completed or fails.
Another feature I appreciate is "back pressure," which NiFi handles automatically. It maintains its own queue and addresses back-pressure issues. If, for instance, an upstream entity isn't fast enough, items get stored in a queue, managed internally by NiFi's back pressure algorithm.
What needs improvement?
There is room for improvement in integration with SSO. For example, NiFi does not have any integration with SSO. And if I want to give some kind of rollback access control across the organization. That is not possible.
So I have to create a separate username and password, and then I have to share it with the individual team. So, that is the pain point to be at the enterprise level.
For how long have I used the solution?
I have been using it for one and a half years.
What do I think about the stability of the solution?
I would rate the stability a seven out of ten because there are a lot of processes that need to be implemented.
What do I think about the scalability of the solution?
It's scalable. It can easily scale on multiple nodes. Depending on the workload, it also handles that internally; like the workers, they coordinate with each other, and they share the workload with each other. So, it's pretty good in terms of scalability.
How was the initial setup?
The initial setup is very easy, especially for users who are familiar with EDL or EMT.
NiFi is one of the easiest tools on the market to learn and use. It is also a quick-win solution, which is good for first-time users who are developing data pipelines for EMT. NiFi makes it easy to track and trace the status of your pipelines, so you can be sure that they are working properly.
What other advice do I have?
If I were to advise someone, I would ask the user what endpoints they want to touch. If I want to read something from Kafka and I want to put this thing on the S3 bucket, what is the alternative I have?
I have Kafka Connect, where I can connect Kafka with one Kafka, and I can put it into an S3 bucket. Is this scalable? No. Is this monitoring No.
We can't monitor it. We can't scale it. It's going to be a complete black box. The person who knows Kafka Connect, or Kafka, can understand what is happening there while using Kafka Connect. But if I compare it, I literally don't need to understand what Kafka is.
I know, "Okay, this is Kafka. These are the endpoints, and this is the URL I have to point to." That's it. My job is done. I will create a complete flow pipeline within, let's say, thirty minutes or something without having any current knowledge. I can read, I can Google it, and I can just implement it.
For people who are new to big data technologies like Kafka and BigQuery, I would give this solution an eight out of ten.
Let's say you need to build a solution to read from Kafka and write to an S3 bucket. You could use Kafka Connect, but if your requirements change and you need to start reading from a database instead, Kafka Connect will not work. With Apache NiFi, you can easily modify your flow pipeline to start reading from the database instead.
Which deployment model are you using for this solution?
Allows the creation and use of custom functions to achieve desired functionality but limitation in handling monthly transactions due to a lack of partitioning for dates
What is our primary use case?
One example is how Apache NiFi has helped us to create data pipelines to migrate data from Oracle to Postgres, Oracle to Oracle, Oracle to Minio, or other databases, such as relational databases, NoSQL databases, or object storage. We create templates for these pipelines so that we can easily reuse them for different data migration projects.
For example, we have a template for migrating data from Oracle to Postgres. This template uses an incremental load process. The template also checks the source and destination databases for compatibility and makes any necessary data transformations.
If our data is not more than ten terabytes, then NiFi is mostly used. But for a heavy table setup, I don't use NiFi for customers or enterprise solutions.
What is most valuable?
I use custom functions for specific features in Apache NiFi. I also use the processes available in NiFi. I can write custom functions to achieve the desired functionality, even if it is not explicitly available as a built-in NiFi feature.
What needs improvement?
Apache NiFi is slow to control and needs to be improved. I have to run many jobs and there are already large tables, which can make it difficult to control NiFi on time.
There is no one to tell me when there is an incident and my server is down. When we manually start the NiFi process, it is not always started correctly. We can write scripts to run when a message is received from Airflow saying that the firewall is not running. This script will automatically start all servers, including the application servers. It will also check the status of all my NiFi processes and send a callback message with the results. I have written down all the processes that are monitored.
We run many jobs, and there are already large tables. When we do not control NiFi on time, all reports fail for the day. So it's pretty slow to control, and it has to be improved.
In future releases, there are extra features I’d like to add. For example, NiFi is not suitable for migration, and the replication in NiFi is really not good. Because when you process ten years of data, you can't manage all the transactions; it is not enough. Moreover, the handling of monthly transactions is not enough due to a lack of partitioning for dates. And, when we grade a monthly ticket, we must process all data then rerun our ETL jobs. If it's possible, enhancing the partitioning in NiFi for features would be beneficial.
For how long have I used the solution?
I have been working with Apache NiFi for one year.
What do I think about the stability of the solution?
I would rate the stability an eight out of ten.
What do I think about the scalability of the solution?
I would rate the scalability a five out of ten because, in our experience, it doesn't scale correctly, especially if you don't use a Kubernetes system.
If you want it to be scalable, you must use Kubernetes, but in our system, it's in VM and VM disc—external and not external. Increasing disc space is a very hard process. NiFi is not easily scalable. You can increase, but decreasing is not possible. So, it is easy to scale up, but scaling down is difficult.
There are around ten end users in our company. We plan to increase the further usage.
How was the initial setup?
The initial setup is very easy. I would rate my experience with the initial setup a ten out of ten, where one point is difficult, and ten points are easy.
But if you want its custom mode and control, it's five out of ten.
For the initial setup, if you configure to custom mode, it's five points. But if you use its single-mode configuration and installation, it's ten.
What about the implementation team?
The deployment takes one week due to network access and some VM installation. Then, we install NiFi and deploy it. But, if you have all the scripts written automatically, it’s five minutes for us.
One person is enough for the deployment process. It's all about script writing in CAC, and it's one-button quick for deployment.
What's my experience with pricing, setup cost, and licensing?
I am using it open source, so it means it's free for me to use.
What other advice do I have?
If the volume is manageable, I would recommend it. Overall, I would rate the solution a six out of ten.