Sign in
Categories
Your Saved List Partners Sell in AWS Marketplace Amazon Web Services Home Help

Streamsets pre-configured by Miri Infotech Inc. on Ubuntu

Miri Infotech | 5

Linux/Unix, Ubuntu 16.04 LTS - 64-bit Amazon Machine Image (AMI)

Reviews from AWS Marketplace

0 AWS reviews
  • 5 star
    0
  • 4 star
    0
  • 3 star
    0
  • 2 star
    0
  • 1 star
    0

External reviews

36 reviews
from G2

External reviews are not included in the AWS star rating for the product.


    Jered L.

StreamSets used to be a great open source tool, but has lost it’s niche

  • April 23, 2022
  • Review verified by G2

What do you like best?
The simplicity of creating data pipelines visually, with no clunky installation and no databases / metadata to manage, like with other ETL tools. All pipeline info is stored on the file system itself.
What do you dislike?
They have closed source their software and locked it behind a SaaS model. This is a relatively recent change which caused lots of headaches
What problems are you solving with the product? What benefits have you realized?
We are regularly using StreamSets to pull 220 million records daily from Salesforce, eliminating the need to write complicated python code. We are also using this tool for Oracle CDC, which has worked well at scale (20 million transactions / day). I have also used this tool to consume from over 5 different JDBC based sources, with great performance and simple implementation.

The best benefit we have realized is the fact that you can dockerize the SDC service, deploy it in ECS or any container orchestration service, and run pipelines that scale horizontally, instead of having static servers hosting the service. If you can implement this properly, it makes writing ingestion pipelines EXTREMELY simple. I can add a new data source to our ETL jobs within a day, instead of weeks by doing this. And it scales to handle thousands of tables!
Recommendations to others considering the product:
There is no reason not to try the data collector.. It is free to download the Tarbell and install. Try it using docker on your local machine, then deploy it in a development environment for testing.


    nitin s.

Very Powerful and Easy Data Engineering platform. Capable to handle multiple platform and huge data.

  • January 30, 2022
  • Review verified by G2

What do you like best?
StreamSets is very light. Since it is containerized app, it is easy to use with Docker if you are an individual developer. For organizations they can use Kubernetes.
They have a very easy and user-friendly user interface. It takes only a few days for new developers to start and deploy their first pipelines.
StreamSets provides easy and powerful stages(kind of connectors) to integrate StreamSets with different platforms such as Kafka, SalesForce, Oracle DB, Rest API, HTTPS connection, Data lakes and many more.
StreamSets uses regex expression for data transformation related operation which is really easy.
Monitoring StreamSets pipelines are very easy, you can register your Data collector to control hub using provisioning agents. After registering you can deploy pipelines to SCH and create jobs. All of this can be done using their Python SDK which can easily be integrated with ADO release pipelines.
After creating/deploying pipelines users can use SCH subscription to create alerts if pipelines/jobs changes their status.
For individual alerts pipeline have built-in capability to do so.
After their version 4.0.1 , sdc are merged with their data ops platform. This allows individual developers to have the feel of a Control Hub. It also remove platform dependancy.
They have very excellent security. Pipeline can be integrated with Azure Keyvaults which eliminates the needs of sharing credentials with Developers. Same goes for parametrs and runtime parameter. Developers can easily replace any value in pipeline with ADO library variables.
If you are an Organization they provide very extensive support, work instantly on any bug if found by an organization. They also have customer success team which will do anything to make sure your organisation's experience with StreamSets is seamless.
What do you dislike?
A few of the stages are a bit unstable. Like Oracle CDC client. They work fine but in some corner case scenario, it becomes a bit tricky. Logging mechanism is excellent and extensive but it could be simpler.
What problems are you solving with the product? What benefits have you realized?
I am in an organization where we are working on sharing Data between mutiple application running on different platoform. So we needed a tool/platform with can easily integrate with variety of technology and can adopt with this everchanging era.
StreamSets allowed us to share real time data between platfoms which also removed dependancy from heavier ETL tools like SSIS, Abinitio.
Since it is easier which allows our talent developement team enable our developers to use StreamSets.


    Abhishek K.

Streamsets : A Powerful Data Engineering + DataOps Tool

  • January 20, 2022
  • Review verified by G2

What do you like best?
The easy-to-use canvas to create Data Engineering Pipelines with required Stages (Sources + Processors + Executors + Destinations).
Scheduling Data Pipelines were never that easy.
Fetching application Secrets from Key-Vault for enhanced Security.
What do you dislike?
In-built Job Monitoring / Visualisation is not that user-friendly; Streamsets should include features to visualize things like "How many records were streamed from Source to Destination on a particular date, etc."
Better and Detailed logging/error information.
Fragment drill-down feature while monitoring data flow in a running Job.
What problems are you solving with the product? What benefits have you realized?
Being part of one of the Health Care Service provider accounts, we as a Data Engineering Team utilize Streamsets to design Data Pipelines to hydrate ADLS/GCS. This Datasets further helps Data scientists and analysts to generate patterns/insights for the healthcare benefits of customers.
Recommendations to others considering the product:
A product to consider for fast-paced Data Engineering pipeline development.


    Bishnu R.

Managing pipeline over StreamSets on K8S environment

  • October 25, 2021
  • Review verified by G2

What do you like best?
I did not find any difficulties to integrate streamSet Control Hub with Kubernetes by help of StreamSet Controller Agent.
What do you dislike?
Some time updated docker image of StreamSet agent comes with vulnerabilities which the should take are before release.
What problems are you solving with the product? What benefits have you realized?
I am managing StreamSet control agent in k8s environment and still did not experienced any issue.
Recommendations to others considering the product:
Yes, i will always recommend the SteamSet to others.


    Meghana V.

Very good data operation platform, Hassle-free filtration of data and numerous options for the same

  • March 25, 2021
  • Review verified by G2

What do you like best?
Right from the ingestion,filtering,debugging by looking into preview or snapshots

Decent data processing speed, lightweight data collector to configure pipeline, processing the data,preview the data, monitor the pipelines.
Friendly user interface for deleting or adding the connection ,stop,start the pipeline
What do you dislike?
Rate of consumption of real time data can be improved to avoid the lag/dataloss

Editing the single component should be more independent
What problems are you solving with the product? What benefits have you realized?
For the consumption of real time rawdata from our site and filtering,tagging the data to get the number of transactions and this helped us to monitor the system as well as to build our Workloadmodel.

Anomaly detection based on the traffic pattern.

Storing of raw data increases cost,using streamset we filtered out unnecessary data and used only required data for analysis.


    sai s.

Streamsets review

  • March 21, 2021
  • Review provided by G2

What do you like best?
It was very useful when we have used it for loading our tables into hive databases and easy to configure as most of it was drag and drop and minimal customisation required when using streamsets I found it much easier compared to NIFI
What do you dislike?
It been a while that I had actually worked with streamiest but when I used to work on the platform we used to face some issues while mapping the components in the data flow we used to face some performance issues for huge datasets
What problems are you solving with the product? What benefits have you realized?
we used streamsets for building the data lake in our insurance company where we would be getting files at multiple times of the day and the pipes used to trigger when ever the files used to arrive at the landing zone. we used to perform various transformations and data quality checks with in streamsets and used to load the data in hive tables in onpremises


    Investment Banking

Best Work flow for Tracking and processing application with Automation skill

  • March 21, 2021
  • Review provided by G2

What do you like best?
Integrating different components and create pipelines and Preview the pipeline to see whether the pipeline works or not.
What do you dislike?
No Specific dislike as off now, Everything was best
What problems are you solving with the product? What benefits have you realized?
All real-time streaming helps me to track with data flow.


    Telecommunications

Data Migration cross RDBMS and NO-SQL become very easy.

  • March 20, 2021
  • Review provided by G2

What do you like best?
I found it very flexible and GUI-based configuration makes it very user-friendly.
What do you dislike?
So good so far, didn't find anything wrong about streamsets as of now.
What problems are you solving with the product? What benefits have you realized?
Data Migration from RDBMS to RDBMS and RDBMS to NO-SQL.
By using StreamSets I am able to migrate data without any downtime and without any help from DBA. in the traditional way we were doing import and export for RDBMS to RDBMS which is not now needed. from RDBMS to NO-SQL I was using custom scripts to export data in CSV from Oracle and import it in Cassandra but now I have created a pipeline and all work is sorted now.


    Information Technology and Services

StreamSets

  • March 20, 2021
  • Review provided by G2

What do you like best?
Its friendly environment and user interface
What do you dislike?
StreamSets should add more features and should reduce some latency .
What problems are you solving with the product? What benefits have you realized?
We are collecting data by using streamsets
Recommendations to others considering the product:
You can use it


    Banking

Streamsets review

  • March 19, 2021
  • Review provided by G2

What do you like best?
Debugging,ease of use.Streamsts was a useful tool for ETL processes.The difference from other tools it has is that it has lot of transformations.
What do you dislike?
Lots of transformations,real time processing.
What problems are you solving with the product? What benefits have you realized?
Banking problems.Benifits are debugging standards can check at each stage the data passed.