We are using StreamSets for batch loading.
IBM StreamSets
IBM SoftwareExternal reviews
External reviews are not included in the AWS star rating for the product.
Overall good experience, I like the ease of using it.
Streamlining Data Pipelines with ease
Good Product
Powerful and Flexible ETL solution with IBM StreamSets
streaming data pipelines through GUI is great
Efficient Data Pipeline Tool with Some Limitations
Enables effective batch loading with visual interface and enterprise support
What is our primary use case?
What is most valuable?
StreamSets is GUI-based and takes care of load balancing. It allows a hybrid installation approach, rather than being completely cloud-based or on-premises. Additionally, StreamSets provides good enterprise support with a quick turnaround.
What needs improvement?
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infrastructure. I had to switch to a new EC2 box, even though the processor was not fully utilized. It would be beneficial if StreamSets addressed any potential memory leak issues to prevent unnecessary upgrades. Additionally, it would be a great enhancement if StreamSets could produce a lineage graph to visualize how the data has passed through the system.
For how long have I used the solution?
I started using StreamSets in 2022, so it's been almost four years now.
What do I think about the stability of the solution?
From one to ten, I would rate the stability of the product at eight point five.
What do I think about the scalability of the solution?
For scalability, I would also rate it at eight point five.
How are customer service and support?
IBM technical support sometimes transfers tickets between different teams due to shift changes, which can be frustrating. The transition can make resolution slow, as I have to explain the issue multiple times. Overall, I would rate the technical support as eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of StreamSets isn't simple, but it's not too complex either. It’s a standard setup and is fine.
Which other solutions did I evaluate?
StreamSets is the leader in the market. There are many products, and the choice depends on needed features and use cases, but I view StreamSets as the leader due to its capabilities.
What other advice do I have?
If asked, I definitely recommend StreamSets to other users. My overall rating for the solution is nine.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Good integration tool
StreamSets : Review
Useful for data transformation and helps with column encryption
What is our primary use case?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
What is most valuable?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.
What needs improvement?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.
For how long have I used the solution?
I have been working with the product for five years.
What do I think about the scalability of the solution?
The tool's flexibility and performance are good. It allows for task dependency management so others won't be affected if one task fails. It can handle large volumes of data and supports features like change data capture for tracking changes.
Around six months ago, many people in my company were using StreamSets. In the US team, about 42 people across different projects were using it. Similarly, in 2021, there were around 43 users. About 16-18 people in Mumbai used it in my previous company.
How are customer service and support?
The tool's support is good.
How was the initial setup?
Installing StreamSets can take time because it has two versions: a data controller and a data transformer. The data controller is easier to install, but the transformer is more complicated and requires more steps, like setting up tasks and configurations.
It would be best to ensure the environment was ready, including that it worked well with other servers. The process can be both easy and difficult, but if you follow the documentation, it should be manageable.
What was our ROI?
Whether the tool is worth the money depends on the situation. If you don't want to spend a lot on competing products like Databricks or Glue, then StreamSets might be a better option. It's particularly valuable if you prefer not to invest heavily in training your team on new technologies. If your ETL developers or data engineers are comfortable with StreamSets, it can be worth the money.
What's my experience with pricing, setup cost, and licensing?
The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good.
What other advice do I have?
We use various tools and alerting systems to notify us of pipeline errors or failures. StreamSets supports data governance and compliance by allowing us to encrypt incoming data based on specified rules. We can easily encrypt columns by providing the column name and hash key.
If you're considering using StreamSets for the first time, I would advise first understanding why you want to use it and how it will benefit you. If you're dealing with change tracking or handling large amounts of data, it could be cost-effective compared to services like Amazon. It's easy to schedule and manage tasks with the tool, and you can enhance your skills as an ETL developer. You can easily migrate traditional pipelines built on platforms like Informatica or Talend to StreamSets. I rate the overall solution an eight out of ten.