DataHub logo

    DataHub

    Sold by
    DataHub vision is to bring clarity to your data through its next-generation multi-cloud metadata management platform. The technology is based on LinkedIn DataHub and Apache Gobblin - two successful open-source projects incubated at LinkedIn and battle-hardened in production at scale at major enterprises.

    Ratings and reviews

    4.3
    9 ratings
    3 star
    2 star
    1 star
    56%
    44%
    0%
    0%
    0%
    7 AWS reviews
    |
    2 external reviews
    External reviews are from PeerSpot .

    Filters

    Review type

    AWS Marketplace reviews
    External reviews
    Reviews (9)
    Akashkhurana Hirana

    Metadata management has streamlined lineage tracking and data discovery for our teams

    Reviewed on Jun 04, 2026
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Data Hub involves connecting a lot of data that is available and coming from upstream data points or data lakes like Kafka, or in BigQuery itself. We usually connect this data to Data Hub as it is a modern data catalog designed to streamline metadata management. We can put all the metadata of our data inside Data Hub, setting who the owner is and tracking where this data is coming from and where it is consumed downstream. We can have data discovery and governance as well.

    My specific example of using Data Hub in my daily workflow involves an orders table, which is very large and is joined with several other tables. This data is populated by a Kafka consumer that consumes messages from a specific topic, and thereafter, a batch that runs once a day transfers this data to a history table in BigQuery. This allows us to manage visualizations and data management tasks. We usually put all this metadata in Data Hub to track the data lineage, profile datasets, and establish data contracts. This way, we know the lineage of each field, and if any batch fails the data contract check, it sends an email notification to the responsible person. We can add more contracts such as validations to the data as necessary.

    What is most valuable?

    The best features Data Hub offers include its integration capability with many popular tools like Apache Airflow, Snowflake, dbt, Looker, Apache Kafka, and BigQuery. These tools provide us with data in various places, and we commonly use Apache Airflow for the DAG, while utilizing BigQuery as our database and Apache Kafka for consuming messaging queues. Data Hub easily connects with all these tools and features excellent data discovery and visualization capabilities. We can see data visibility, where it comes from, its upstream and downstream relationships. If we remove a column, we can assess the impact of that change. Furthermore, if there are duplicate datasets being used by different teams that do not communicate regularly, onboarding all data to Data Hub allows us to identify these duplicates easily.

    Out of all those features, I believe data discovery and impact analysis are the most valuable for my team because when we want to add or drop a column, we can assess the impact analysis to understand the downstream effects. This helps us know who owns a dataset, and we can easily contact the owner. Tracking the data lineage back to the source table is also a key benefit.

    Data Hub has positively impacted my organization by significantly reducing manual work that was previously needed to identify upstream and downstream data relationships, as well as recognizing duplicate datasets. If a data contract is broken, we now easily get notified of those issues, making the process much easier and more efficient. It is particularly useful for data engineers and platform teams to check for problems directly within Data Hub.

    Data Hub has saved our team a lot of time. For example, in a large company like Porch, if I want to know whether a specific dataset exists, I can check Data Hub, as it serves as a centralized point for managing the metadata of our data. While it does not contain all data, it does contain the metadata necessary for understanding the dataset's origin. If a dataset does not exist, I can simply see who the owner is and reach out to them, which reduces the dependency on others by providing direct access to information in Data Hub.

    What needs improvement?

    Regarding improvements for Data Hub, I think there is no scope for improvement. It is the best tool in the market currently. I have reviewed some other tools as well, but Data Hub stands out.

    In terms of areas for improvement, I do not see anything lacking. Data Hub offers both cloud and self-hosted deployment options, and it has a robust community. They hold open Slack community sessions as well as webinars, typically once or twice a month, to share knowledge and updates, which is a significant benefit. I have not encountered any major issues with Data Hub.

    For how long have I used the solution?

    I have been using Data Hub for around three years.

    What do I think about the stability of the solution?

    I have not seen any downtime within Data Hub.

    What do I think about the scalability of the solution?

    In my experience, Data Hub's scalability is impressive. We have around 300 datasets from BigQuery, 400 from Kafka, and many more, yet I have not seen any downtime within Data Hub. We have successfully onboarded over 1000 datasets from various sources without any issues.

    How are customer service and support?

    Customer support for Data Hub is very genuine, and they are responsive and attentive. If I raise a ticket today, they usually respond by the next day. Additionally, they host webinars monthly to discuss new features and updates. They also have an open Slack community where responses tend to be immediate.

    Which solution did I use previously and why did I switch?

    I previously used OpenMetadata before adopting Data Hub, but I found Data Hub to be more user-friendly and easier to utilize than OpenMetadata.

    How was the initial setup?

    Data Hub exceeds expectations in user-friendliness and functionality. It features a great user interface, an available SDK, APIs, and GraphQL previews, all complemented by a responsive Slack community and helpful customer support. The ease of documentation, website usability, and setup contributes to its overall effectiveness.

    What other advice do I have?

    Additionally, we use some other data governance tools with Data Hub. We can add domains to any dataset, such as specifying that this is the orders domain or the customer domain. We can add more tags, manage data ownership by indicating which team owns specific data, and create glossary terms, which act as labels for different datasets.

    I find myself relying on Data Hub for lineage checks and data contracts once a week.

    Regarding Data Hub's AI capabilities, it exposes several MCP servers that easily integrate with LLMs such as Claude, Cursor, Gemini, or LangChain, along with the Agent Development Kit from Google. In terms of security, Data Hub ensures that no company data is exposed outside, and they maintain strict confidentiality regarding the metadata of the company, adhering to similar NDAs that prevent revealing sensitive information.

    In terms of accuracy and reliability of output with Data Hub's AI capabilities, I find it exceeds 95% accuracy. Having utilized the MCP connectors with Claude and the ADK, I can confidently say that it performs flawlessly and retrieves data effectively.

    My advice for others considering the use of Data Hub is to add more glossary labels and categorize datasets by domain. While it is manageable with a smaller dataset, as the amount of data scales, these glossary terms and domains become immensely helpful. Initially, we did not leverage them, but we found their value as we scaled up and needed to filter data efficiently. I would rate Data Hub a perfect 10 overall.

    PrashantGupta2

    Centralized lineage and catalog have transformed how we track incidents and classify sensitive data

    Reviewed on Jun 03, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is to catalog the dataset across my company and to get the lineage of data in the my company pipeline.

    To give an example of how I use Data Hub in my day-to-day work, suppose the data is flowing from a source to Kafka and then to some data storages. If some cross-team wants to use the data but there is a problem at the Kafka level, we are not sure who all are consuming that data. Data Hub is very useful for us in this scenario. It can generate the lineage from source to destination, and when there is an issue at the Kafka side, we will get to know what the end results and impacted data sources are.

    I would add that sometimes when we do not want to share the data or when the customer or another team wants to consume the data, we are not sure what kind of data is there. We have to look at the schema. Data Hub is useful for us as we are doing the cataloging of all the datasets across my company, allowing us to later use and see the table information and schema information so that the team can identify what data is PII or non-PII.

    What is most valuable?

    The best features Data Hub offers include support for cataloging and lineage very well, as we are getting all the different types of connectors to consume and use across the my company dataset pipeline. Apart from that, the GraphQL APIs provided by Data Hub are very good, allowing us to get all the information we need programmatically whenever we need it.

    Regarding how the GraphQL APIs help my team in day-to-day tasks, we sometimes use custom logic to check whether the data has PII or non-PII. We have some AI model running on top of it, which requires classification. Based on the dataset URL, we are getting information about the dataset using the GraphQL APIs. GraphQL APIs are very handy, allowing us to customize properties and pass on the necessary information. For example, if we need a structured property, we can get those structured properties. If we need tags or owners, we can retrieve that as well.

    Data Hub positively impacts my organization by enhancing collaboration as previously, we had to ask the team to provide the schema information. my company operates in a cross-region environment, so a person in India could wait a day to receive information about the schema from someone in the US. However, with Data Hub, we have a centralized place where we can access all the schema of the datasets, making it very helpful. Additionally, whenever there is a problem, using the lineage helps us quickly identify the impacted team or dataset.

    Whenever there is an incident, we first go to Data Hub to see the downstream teams impacted and stop any jobs running on those datasets. It helps us save around eighty percent of time, as we previously had to track down information manually to find the owners, but using Data Hub, we can tag the owners of the datasets directly in the tool.

    What needs improvement?

    For improvements to Data Hub, I feel the security is a bit on the weaker side. We have ingestion jobs that require exact permissions for different owners, but this setup does not align with the my company grouping system. We need to create some custom grouping to manage those permissions. I would appreciate it if there were a method to consolidate all the information on a single page, which would simplify sharing permissions for running ingestion jobs.

    Additionally, I do feel that the metadata test we run daily takes too long. Initially, it takes one day, which I find excessive. Ideally, we should get information within one hour. These are the two main issues that would benefit from improvement for our use case.

    For how long have I used the solution?

    I have been using Data Hub for one and a half years.

    What do I think about the stability of the solution?

    Data Hub is stable in my experience. However, there are times when we attempt to upgrade it, and it may go down for a couple of minutes, but not more than that.

    What do I think about the scalability of the solution?

    Data Hub handles scalability effectively, accommodating growing data and users.

    How are customer service and support?

    I have had to reach out to Data Hub customer support multiple times. For example, when we were setting up a private link to connect to Data Hub GraphQL APIs, we required our account to be whitelisted. I have also requested some future features for our use cases. For instance, when working with a metadata test scenario, I needed to have a range date column, which was not available. I requested the Data Hub team to make it public so we could use it.

    What was our ROI?

    I have seen a return on investment with Data Hub. For instance, I have noticed time savings during incidents and while looking up schemas. In terms of resources, Data Hub centralizes data cataloging and classification, saving us from having to disclose PII column information to teams not utilizing it. Regarding financial metrics, I do not have specific metrics available.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we looked into Unity Catalog from Databricks, but we ultimately decided to stick with Data Hub.

    What other advice do I have?

    My advice for others looking into using Data Hub is to use it for cataloging, classification, and centralizing all your schema. Data Hub supports a variety of connectors and has excellent lineage options. Additionally, make sure to utilize the well-written documentation that can guide you in building your product solutions. I would rate this product a nine out of ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2847339

    Centralized data library has boosted discovery, collaboration, and time savings across teams

    Reviewed on May 30, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is that we use it as a library for all the data assets that we generate. It serves as an internal data mart where people can search for whatever data they need, and they can search by tags, by roles, and then add more metadata to it. This provides visibility to the data.

    A specific example of how my team uses Data Hub in a real-world scenario is that we collect and manipulate a bunch of data layers. Because we have huge teams, the exposure to data that we have already manipulated can sometimes be hindered when using traditional systems. Data Hub acts as a search engine for all of the data. One example would be when the marketing team was looking for specific data around marketing. They discovered that once they searched it on Data Hub, it was easily visible. They did not have to retrieve it from the raw layer and manipulate it for their usage because another team had already built it.

    Regarding how my teams interact with Data Hub, we use Data Hub with a self-hosted system. We have connectors which look into multiple data sources, manipulation engines, and orchestration layers to gather the metadata, and then that is pulled into Data Hub. This is how we get data assets in Data Hub.

    What is most valuable?

    The best features that Data Hub offers include primarily data discovery and data governance. Data Hub has data catalogs, which helps with the business glossary, ownership tracking, and lineage. Lineage is something that we are strongly using at this point in time. It helps us understand the impact analysis, such as what breaks if I change this column. Data Hub also provides data observability, helping us understand what data is fresh, what is not, and what has changed schema recently. Additionally, it makes our system AI and LLM ready.

    The lineage feature has changed the way my team works and collaborates significantly. Because we now have data lineage through Data Hub, if we have a really huge dependent pipeline with multiple layers of upstream and downstream dependency, and something breaks in the downstream system, we can exactly pinpoint what all data assets would be affected. Having that lineage functionality helps us drill down what needs to be debugged and fixed and what exact part is breaking. It saves us time in remedying the issue.

    I really like the integrations that Data Hub provides. Data Hub has a very large set of integrations that we can do with Snowflake, Databricks, BigQuery, Redshift, DBT, and Airflow.

    Data Hub has positively impacted my organization as teams can now be directly dependent on one source of truth for all their data needs. The time spent finding information has become significantly smaller, which is the real productivity improvement that I have seen, impacting multiple teams throughout the organization. I estimate that we save about thirty to forty percent of the time now since we do not have to read documents or message people for specific data assets. This results in a productivity increase of around thirty to forty percent in terms of time and efficiency.

    What needs improvement?

    I think Data Hub can be improved by supporting the open source version better. Many features have moved to the paid version now, making it difficult for small-scale companies to operate on Data Hub because we are required to pay, even though it started as an open source project that is now essentially behind a paywall.

    One needed improvement for Data Hub would be stronger AI-powered metadata discovery. I understand Data Hub has been investing in AI, but the natural language processing power on Data Hub search is not that good. The search itself is not accurate many times. Another improvement could be enhancing the DBT developer experience, such as surfacing DBT test failures directly in lineage. Additionally, when we change schema, if it could provide a risk scoring of some sort, that would also be beneficial. Lastly, automated cleanup recommendations would help because managing orphan data assets on Data Hub currently takes a lot of manual time.

    For how long have I used the solution?

    I have been using Data Hub for a year.

    What do I think about the stability of the solution?

    Data Hub is pretty stable in my experience with no downtime or issues.

    What do I think about the scalability of the solution?

    Data Hub's scalability has been effective, handling our organization's growth and data volume well.

    How are customer service and support?

    I have not had to reach out to customer support.

    Which solution did I use previously and why did I switch?

    I did not previously use a different solution before Data Hub.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing has been pleasant, and I have no complaints.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we evaluated Atlan and decided on Data Hub because it has a cleaner UI and also a decent open source community to support it.

    What other advice do I have?

    Data Hub does most of the job it is designed to do, but there could still be improvement as the industry progresses, particularly around metadata discovery. Regarding Data Hub's AI capabilities, its governance and security do the job really well as of right now. I do not have any complaints, especially around data classification, as it allows us to have control over whatever data we are displaying, including customization for PII, sensitive, and financial data. Data Hub has met our expectations regarding its accuracy and reliability of output, and there have not been any issues.

    My advice to others looking into using Data Hub is that it is a pretty nice product right now with easy integration. The pricing model could be negotiated, so it is essential to keep that in mind. I would rate Data Hub a solid eight on a scale of one to ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Shubham-Agarwal

    Centralized lineage has reduced onboarding time and improves tracking of complex data flows

    Reviewed on May 26, 2026
    Review from a verified AWS customer

    What is our primary use case?

    Our main use case for Data Hub is for data lineage and metadata governance for our UFC project, where we are utilizing multiple databases such as SQL Server, Databricks, and Snowflake. We have adopted Data Hub to create centralized metadata for all these databases.

    A specific example of how we use Data Hub for metadata governance in our UFC project involves getting data from multiple sources including Excel files, CSV files, APIs, and external databases, storing that data first into Amazon S3 buckets, and then into Snowflake staging areas. We transform the raw data using a DVT model, create a silver layer, and then load data into the gold layer for reporting. With Data Hub, we have a centralized view of the data flow, which makes it easier to track issues in downstream applications such as Power BI reporting.

    We also use Data Hub for onboarding new team members, as it was previously hectic to provide complete metadata details from our seven to eight data sources and over two hundred tables in our Snowflake database. Now, new team members can refer to the lineage of any table or column to understand the complete flow without relying solely on others.

    What is most valuable?

    One of the best features Data Hub offers is its ability to identify schema changes in the source side efficiently, especially when we pull data from multiple external databases such as SQL Server. It helps us quickly pinpoint necessary updates when columns are added or removed, streamlining what was previously a time-consuming manual process.

    I find Data Hub quite manageable in the downstream application within the UFC data mart, mainly when issues are reported in Power BI. It provides a complete view of the data lineage, allowing us to backtrace the source of any discrepancies easily.

    Data Hub has positively impacted our organization by reducing the knowledge transition period from three months to one month for new team members, enabling them to refer to the complete lineage without depending heavily on others, which is a substantial improvement.

    What needs improvement?

    In terms of improvements for Data Hub, it seems more useful for critical or large data pipelines, as small data architectures can be straightforward to understand without it.

    Regarding enhancements for complex projects, I have noticed that sometimes Data Hub does not provide a complete picture of the lineage, particularly in complex data pipelines such as when we fetch data from an API to S3 and subsequently to Snowflake. We have to review the metadata in Data Hub closely.

    For how long have I used the solution?

    I am working in the data engineering field for over twelve years.

    What do I think about the stability of the solution?

    Data Hub is stable in my experience.

    What do I think about the scalability of the solution?

    Data Hub's scalability is advantageous, as we onboard data from over one hundred fifty tables in SQL Server to Snowflake, and adding new tables to Data Hub is not time-consuming.

    How are customer service and support?

    Customer support for Data Hub is quite good; our infrastructure team received ample support during the initial setup within the given timelines.

    Which solution did I use previously and why did I switch?

    Previously, we used the Snowflake inbuilt lineage graph to identify data flow, but we switched to Data Hub for its centralized governance capabilities across multiple databases.

    How was the initial setup?

    The initial setup of Data Hub was completed by our infrastructure team, and I do not have complete visibility of how they made the purchase.

    What about the implementation team?

    Regarding pricing, setup cost, and licensing for Data Hub, it was handled by our client infrastructure team, so I lack visibility into those aspects.

    What was our ROI?

    I have seen a return on investment with Data Hub, notably in reducing the knowledge transition period and improving our ability to troubleshoot production issues in Power BI, thus saving time.

    Which other solutions did I evaluate?

    We did not evaluate other options before choosing Data Hub since we were solely relying on the lineage functionalities of Databricks and Snowflake.

    What other advice do I have?

    My advice for others considering Data Hub is to utilize it, as it is free and can significantly reduce time for production support and addressing data issues, while simpler data models can benefit from the inbuilt functionalities of their respective databases. I would rate this product eight point five out of ten.

    Henrique dos Anjos

    Metadata governance has improved data lineage visibility but still needs simpler integrations

    Reviewed on Mar 31, 2026
    Review provided by PeerSpot

    What is our primary use case?

    I work with Data Hub as a user, but I also have some administrative responsibilities there. I'm not a final user; the final users are business users, and I play some administrative roles in the tool to have the metadata information available for all Uber users.

    I'm a Data Quality Engineer focused on data governance. I manage the metadata information for Uber, and I also use this to apply some data quality rules. My focus in my current job is to apply some rules and manage the metadata information and ensure it is accurate for the end users, which is why I'm using it.

    What is most valuable?

    One of the biggest advantages of Data Hub is the very good integration, for example, a department focused on development made the integrations between Data Hub and BigQuery. When this integration is very well done, it is possible to check data lineage, which I think is a very important subject in data governance. It's something that cannot be done manually, so having a tool that shows the data lineage from the source until the target tables helps us a lot. I think this is one of the best advantages that we have.

    Data Hub helps to analyze data from various sources in my case.

    What needs improvement?

    I know that the integrations are not easy to do, and I believe it happens because it's a customized solution. There always needs to be software developers to work on this. It's complicated; every time we want to integrate new things or new sources, we need to generate a ticket or a request to another department. When I had my experience with Atlan, for example, I was able to connect different sources in a very user-friendly way. I just needed to set up some configurations and connect to the source without having to be a software developer or develop any code in the back end. It was just a feature in the data catalog that enabled me to connect with different kinds of sources. That's why I think the disadvantage of having a customized solution. Although I think Data Hub itself is a very good tool, years ago I had the opportunity to work with it, but with a clear interface and the open-source solution, which was very clear and easy to connect. At Uber, we need to have a request when we want to integrate new sources.

    Regarding Data Hub's intuitiveness, regarding analytics, I would say that some quality dimensions are available for us. For example, for each field name or each column in a table, it's possible to see the frequency, how many values we have for a specific type or category, and we can see if there are new or null values, whether the columns are empty or not, along with some metrics. This is regarding the data quality dimensions, such as nullables and things of that nature. That is all we have for features. I remember when I was working with Atlan, there was a feature I liked very much—the possibility to have a sample. When I clicked on a table, I could see a short sample without needing SQL skills. I just clicked the table and could see some values or what the table represents; the data catalog would show a screen with some rows of the table. This feature was very good, but we don't have it in Data Hub the way it is implemented at Uber. I think it would be a very good feature for analytics, and we don't have it at the moment.

    The integration part could be better, but again, it's because it's a customized solution. I think if they used the native version of the tool, it would be simpler. The integration part and the process of setting up new data quality rules would be important for data governance players like me.

    For how long have I used the solution?

    I've been using Data Hub for one year and a half.

    What do I think about the stability of the solution?

    Since I've been using Data Hub, it has always been very stable; I can say it was one hundred percent stable. I never encountered issues trying to check datasets or columns and checking their numbers. It has always worked very well in that regard.

    What do I think about the scalability of the solution?

    I think Data Hub can scale fast in its native way, but with a customized solution, it takes more time.

    How are customer service and support?

    My support is internal when I have any questions or requests, so I direct it to a support team from Uber and not from the provider. When I was working with Atlan, and needed support, they were very good at attending to my requests directly. I had contact with the provider, so it was very fast. At the moment, I don't have that; I direct my requests to an internal department of Uber.

    Which solution did I use previously and why did I switch?

    I'm not using Atlan anymore because the company that I was working with, I'm no longer there. I went to another consultancy group and now I'm working with other platforms. Atlan is not the one that I'm working with at the moment.

    I am working with a different platform that is also regarding data governance and metadata management. The platform itself, the back end, is Data Hub. But the user interface is customized for this client. I'm currently working for Uber, the Uber company.

    How was the initial setup?

    Because Data Hub is a customized solution, I don't have many details about the installation and deployment process. However, when I was using Atlan, I saw that they implemented very fast. In this way, I believe both tools have an easy way to implement, but because Uber chose to have a customized solution, it became more difficult and complex. However, in their native way, I think both tools are good.

    What was our ROI?

    In terms of ROI, I would say that Atlan is better. I had a very good experience using Atlan, and I believe it's faster. Velocity in organizations today is very important; people want to see things very fast. I believe Atlan has a better approach compared to Data Hub.

    The way Data Hub is implemented at the moment, Atlan is much better. It's much, much faster.

    Which other solutions did I evaluate?

    I worked with Databricks, but I'm not sure if it is from Amazon; I don't think so. I think Databricks is from Microsoft.

    What other advice do I have?

    I have experience with Data Hub to some extent.

    I believe Data Hub uses a lot of APIs, but I don't think I'm the right person to answer that because it relies a lot on a technical aspect that I don't understand. I cannot provide you with a curated answer about it, but I know that the software development team that works with this customized solution uses APIs; I just don't know how to speak about their performance, whether it's good or not.

    Real-time batch processing is very important for me and my organization because some datasets are very critical for the business. If we have batch processing, it's good for the organization to set up a very large dataset, for example, and have it available on the data catalog in a short time. I agree that this is important.

    In both experiences I had, the integration with the catalog was with GCP. I don't have experience working with another data warehouse, so even in Atlan or now in Data Hub, it is connected with GCP.

    I don't use anything else like CRM, storage, or any architecture management tools; just Data Hub.

    I would give Data Hub a score of seven out of ten, summarizing everything that I've discussed about the product.

    Azhagarasan Annadorai

    Catalog has centralized PII ownership and collaboration but still needs better automation and UI

    Reviewed on Feb 22, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is to enrich the metadata to classify for PII data. As an administrator, I crawl a number of data sources and bring the metadata into a single place, then assign the ownership, such as a data owner or steward, for all the data assets. With their help, we classify the data into PII direct and indirect, sensitive, non-sensitive, and so on. We add tags and glossary terms onto the data elements. The main use case is for DSAR compliance; for GDPR DSAR compliance, we try to identify the PII data in the catalog so that we know where the PII data is in our data inventory.

    How has it helped my organization?

    The catalog helps with metadata discovery and to find the owners/ stewards of data sources. Without a data catalog, we scramble around and speak to multiple teams, which is time consuming.

    What is most valuable?

    The best features that Data Hub offers include the management of ownership, with standard out-of-the-box ownership such as business owner, data steward, or technical owner, which is relevant for us. It also integrates with Active Directory. In our Active Directory, we maintain certain roles based on the scrum teams related to a team member, and by integrating with Active Directory, we are able to bring the same roles and map them to the corresponding ownership roles within Data Hub. Data Hub has integrations with Slack, Snowflake, BigQuery, and so on, which we use.

    Data Hub has positively impacted our organization by bringing the tribal knowledge that resides with team members into a single place where users can discover and understand the data elements before they make use of it. Users can ask questions via Slack to understand how a data element is defined and get the answers back. This definitely saves time; without a data catalog in place, users need to ask around to find out what a particular data element means and to find out the owners. Now, with the data catalog, searching and discovering data elements and the corresponding owners is easier, saving approximately thirty to forty percent of the time that would have been spent finding out the owners and definitions of the data elements.

    What needs improvement?

    Data Hub can be improved with more automation; there are some inbuilt automations, such as documenting definitions of data elements using AI, which is useful. I wonder if it can automate the classification exercise, possibly using AI to auto-classify PII direct and indirect items.

    For how long have I used the solution?

    Just started using it.

    What do I think about the stability of the solution?

    Data Hub is stable.

    What do I think about the scalability of the solution?

    Data Hub shows scalability in terms of the number of users and the number of new databases and data elements.

    How are customer service and support?

    We have not gone that far with customer support; as far as the POC is concerned, we received good support from the team and the sales team that helped us evaluate the tool.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    We previously used OvalEdge as our data catalog before switching to find a tool that has more AI capability and allows extension of usage to non-technical users, seeking a tool that is less clunky and more intuitive.

    How was the initial setup?

    slightly techical. But there is enough documentation available.

    What about the implementation team?

    No

    What was our ROI?

    I have not yet seen a return on investment, and I do not have that information to share.

    What's my experience with pricing, setup cost, and licensing?

    Regarding experience with pricing, setup cost, and licensing, I think if we have a budget of one hundred thousand US dollars, we will be able to deploy a reasonable version and connect to a number of data sources.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we compared it with Atlan and Alation.

    What other advice do I have?

    I chose seven out of ten because there are better catalogs available in the market that offer more features. The UI, especially when setting up new data sources and crawling them, is a little cumbersome, but it is a one-time activity, so it is manageable; however, the UI could be improved concerning administration.

    My advice to others looking into using Data Hub, also known as Acryl, is that it is a reasonably stable product that satisfies most data catalog use cases; however, Atlan appears to be the closest competitor, while Alation is the market leader among the three. Data Hub has an open-source version I believe, and it may be worth considering that option as well.

    I rated this review seven out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2784462

    Centralized metadata has empowered governed data discovery and clarified ownership for all teams

    Reviewed on Dec 27, 2025
    Review from a verified AWS customer

    What is our primary use case?

    We adopted Data Hub in the context of a large enterprise customer operating in a regulated industry with a strong focus on data governance, data discoverability, and ownership clarity across multiple cloud-native platforms. The solution was deployed on AWS, and the main business problem was the lack of a centralized, reliable view of data assets, including poor data discoverability, unclear data ownership and stewardship, limited lineage visibility across ingestion and transformation layers, and high dependency on tribal knowledge held by a few individuals. Data Hub was selected as an enterprise data catalog and metadata backbone with the goal of enabling both technical teams and business users to easily understand, trust, and reuse data.

    We used Data Hub to create very good data discoverability, assign data ownership and stewardship, improve data quality processes, and establish good data governance for our customer in terms of data catalog, data lineage, and metadata management in general.

    What is most valuable?

    Our key benefits that we achieved include centralized metadata management across multiple AWS services and data platforms and improved data discoverability, significantly reducing the time required to find relevant data sets. Clear data ownership and stewardship improved accountability and collaboration between teams. End-to-end lineage visibility enabled faster impact analysis and safer changes, and faster onboarding of new data users through self-service access to documentation and metadata. From a governance perspective, Data Hub became a single source of truth for metadata, supporting both compliance requirements that are very important in a data governance environment and day-to-day operational needs.

    The main strengths we experienced with Data Hub are a strong metadata model and its extensibility because Data Hub offers a rich and flexible metadata model that adapts well to complex enterprise scenarios. Excellent lineage capabilities are provided because the lineage visualization is clear, actionable, and extremely useful for impact analysis and governance workflow. The open source foundation with enterprise readiness is significant because the open architecture avoids vendor lock-in while still being suitable for production-grade environments.

    Data Hub is very effective for us because we build the data lineage from the beginning, from origination to visualization, to the final use of the data. We follow and track a path of the data, which improves analysis and enables us to find where data is used and the impact of deleting data. This is also very important in a regulatory environment.

    What needs improvement?

    The impact is very positive, and there are many benefits for us using Data Hub because it was easier to make data governance, create centralized metadata management, improve data discoverability, and manage data in general. The areas for improvement, in my opinion, are the initial setup and configuration that can be complex without prior experience, especially in large-scale environments. User experience for non-technical users could be further simplified, particularly around advanced metadata concepts. The out-of-the-box governance workflow, for example, approvals and certification, could be more prescriptive for customers at early maturity stages.

    Data Hub can be improved in the initial setup and configuration that is somewhat complex, and also in operational monitoring that could benefit from more native dashboards and alerts. However, these are not blockers, but areas where additional guidance or product enhancement would further accelerate adoption.

    For how long have I used the solution?

    I have been using Data Hub since 2023.

    What other advice do I have?

    Based on internal measurement and feedback from the data teams, there are many impacts. Time to locate and understand a data set was reduced by approximately 40-50 percent. Manual documentation effort was reduced by around 40 percent. Dependency on senior data engineers for data explanation dropped significantly. Data onboarding time for new team members decreased from weeks to days.

    I would rate this product a 9 out of 10. I chose nine because Data Hub proved to be a robust, scalable, enterprise-ready data catalog that is well-suited for AWS-based architecture and complex organizational environments. It is always possible to improve and useful to maintain space for further optimization.

    My advice is to use Data Hub to move from fragmented metadata and manual processes to a modern, governed, and self-service data ecosystem, delivering clear value in terms of efficiency, cost saving, and data trust. We would confidently recommend Data Hub to organizations looking to improve data governance, data discovery, and metadata management on AWS.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    reviewer2784771

    Analytics work has become more efficient and now processes large datasets with flexibility

    Reviewed on Dec 04, 2025
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Acryl Data is analytics.

    What is most valuable?

    Acryl Data helps with processing large amounts of data as it is a very good tool that gives good flexibility to store a huge amount of data and is easier to use. The UI is good.

    The best features Acryl Data offers include storage. When I mention storage, I refer to its scalability.

    The positive impact of Acryl Data is that it has increased efficiency.

    What needs improvement?

    I do not have comments on how Acryl Data can be improved.

    For how long have I used the solution?

    I have been using Acryl Data for two years.

    What do I think about the stability of the solution?

    Acryl Data is stable.

    What do I think about the scalability of the solution?

    Acryl Data's scalability is good.

    How are customer service and support?

    The customer support is good.

    How would you rate customer service and support?

    Which solution did I use previously and why did I switch?

    I did not previously use a different solution.

    How was the initial setup?

    My experience with pricing and setup was good.

    What was our ROI?

    I have seen a return on investment as it has saved time.

    Which other solutions did I evaluate?

    Before choosing Acryl Data, I did not evaluate other options.

    What other advice do I have?

    My advice to others looking into using Acryl Data is that they can use it. I gave this product a rating of 9.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2784384

    Simple data insights platform has boosted development speed and revealed top purchasing customers

    Reviewed on Dec 04, 2025
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Acryl Data is to extract insights from customer data. I use Acryl Data for a project in order to identify all the customers and find out which customer buys a lot of items.

    What is most valuable?

    The best feature Acryl Data offers is the simplicity of the UI. The UI is simple for me because it is easy to navigate. Acryl Data has positively impacted my organization by speeding up all the development. It sped up development because the team can access data faster, improving speed by approximately 50%.

    What needs improvement?

    The product cannot be improved in just one area. There are no points in support or documentation that require improvement. There are no improvements needed for Acryl Data that I have not mentioned yet.

    For how long have I used the solution?

    I have been using Acryl Data for five months.

    What do I think about the stability of the solution?

    Acryl Data is stable.

    What do I think about the scalability of the solution?

    I think the scalability of Acryl Data is a good point.

    How are customer service and support?

    The customer support is fine; we do not need any customer support, but I think it was fine.

    How would you rate customer service and support?

    Which solution did I use previously and why did I switch?

    I did not previously use a different solution; I have no experience with any other solutions.

    What was our ROI?

    I have seen a return on investment through time saved and also money saved. I do not have specific numbers or examples about the time or money saved.

    Which other solutions did I evaluate?

    I did not evaluate other options before choosing Acryl Data; I evaluated only this option.

    What other advice do I have?

    My advice to others looking into using Acryl Data is to start faster with the analytic insights. I would rate this product a 10.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?