Listing Thumbnail

    DataHub

     Info
    Sold by: Datahub 
    Deployed on AWS
    DataHub vision is to bring clarity to your data through its next-generation multi-cloud metadata management platform. The technology is based on LinkedIn DataHub and Apache Gobblin - two successful open-source projects incubated at LinkedIn and battle-hardened in production at scale at major enterprises.
    4.2

    Overview

    DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility. The company's enterprise SaaS offering, DataHub Cloud, delivers a fully-managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance - enabling AI & data to work together and bring order to data chaos.

    For Data Analysts, developers, data scientists, and automated workflows:
    Easily find trusted datasets with the most current data

    • Access data where you work with a chrome extension for BI tools
    • Discover data your way - personalization for multiple business and technical user profiles
    • Support AI models and automations with a metadata graph that keeps up with today's data volume and velocity
    • Understand data provenance with table, column, and job level lineage graphs
    • Auto-enrich metadata with no-code automation
    • Use AI-generated documentation and propagation to better understand context
    • Always stay up-to-date with subscriptions to assets, activity and notifications

    For Data Engineers:
    Deliver reliable data quality

    • Provide end-to-end observability with user-created data quality checks and reports
    • Surface data quality results and impact analysis across all points in lineage
    • Monitor freshness SLAs, data volume, table schemas, column quality, and custom SQL
    • Use AI Anomaly Detection for freshness, volume, and column stats
    • Easily keep an eye on data quality with assertions and AI-based smart assertions
    • Evaluate data contracts and quality checks on-demand with API
    • Get notified where you work (slack, email, and more)
    • Easily manage data quality with a data health dashboard

    For Data Governance:
    Ensure continuous AI & data governance in production versus episodic compliance checks

    • Ensure every AI & data asset is accounted for by defining and enforcing documentation standards
    • Integrate governance practices early with automated shift-left governance
    • Automatically classify your data as it moves and transforms with lineage-driven compliance
    • Keep tags harmonized with seamless metadata flow between DataHub and source systems
    • Deliver continuous compliance monitoring with forms, impact analysis, and reporting
    • Create and implement bespoke compliance approval workflows

    Highlights

    • Search All Corners of Your Data Stack- DataHub's unified search experience surfaces results across databases, data lakes, BI platforms, ML feature stores, orchestration tools, and more.
    • Trace End-to-End Lineage- Quickly understand the end-to-end journey of data by tracing lineage across platforms, datasets, ETL/ELT pipelines, charts, dashboards, and beyond.
    • View Metadata 360 at a Glance- Combine technical, operational and business metadata to provide a 360 degree view of your data entities.Generate Dataset Stats to understand the shape & distribution of the data.

    Details

    Sold by

    Delivery method

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Buyer guide

    Gain valuable insights from real users who purchased this product, powered by PeerSpot.
    Buyer guide

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    12-month contract (1)

     Info
    Dimension
    Description
    Cost/12 months
    Discover & Govern
    Up to 20 Monthly Active Users
    $75,000.00

    Vendor refund policy

    All fees are non-cancellable and non-refundable except as required by law.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Software as a Service (SaaS)

    SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

    Resources

    Support

    Vendor support

    Email support is offered Monday - Friday during regular business hours.
    marketplace@datahub.com 

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Product comparison

     Info
    Updated weekly

    Accolades

     Info
    Top
    10
    In Data Catalogs
    Top
    10
    In Data Catalogs, Data Governance, Master Data Management
    Top
    10
    In Data Catalogs, Data Governance

    Customer reviews

     Info
    Sentiment is AI generated from actual customer reviews on AWS and G2
    Reviews
    Functionality
    Ease of use
    Customer service
    Cost effectiveness
    Positive reviews
    Mixed reviews
    Negative reviews

    Overview

     Info
    AI generated from product descriptions
    Unified Search Across Data Stack
    Search functionality that surfaces results across databases, data lakes, BI platforms, ML feature stores, and orchestration tools within a multi-cloud environment.
    End-to-End Lineage Tracing
    Lineage tracking capability that traces data journey across platforms, datasets, ETL/ELT pipelines, charts, and dashboards at table, column, and job levels.
    AI-Powered Metadata Management
    Metadata graph with AI-generated documentation, AI anomaly detection for freshness and volume metrics, and smart assertions for data quality monitoring.
    Data Quality Monitoring and Observability
    End-to-end observability with user-created data quality checks, freshness SLA monitoring, schema tracking, column quality assessment, and custom SQL evaluation through API.
    Automated Governance and Compliance
    Lineage-driven compliance classification, automated shift-left governance integration, continuous compliance monitoring with forms and impact analysis, and metadata harmonization across source systems.
    Metadata Centralization
    Centralizes metadata from disparate sources into a unified platform for discovering, describing, governing, and managing data assets including data, BI reports, and AI models.
    Behavioral Analysis Engine
    Incorporates a Behavioral Analysis Engine to provide advanced analytics and insights across data assets.
    Data Lineage and Tracking
    Enables documentation of insights and tracking of data lineage across teams for transparency and compliance purposes.
    Self-Service Analytics
    Supports self-service analytics capabilities allowing users to independently discover and analyze data assets.
    AI Governance Framework
    Provides an AI governance framework that ensures data quality, transparency, and compliance for AI initiatives.
    AI Governance Framework
    Active metadata-based governance with rules, processes and responsibilities to ensure ethical AI practices, mitigate risk, adhere to legal requirements, and protect privacy
    Automated Data Lineage
    End-to-end lineage tracking providing transparency into data transformation and flow across systems, including both summary-level business lineage and detailed technical lineage
    Unified Data Catalog
    Multi-cloud and hybrid environment data discovery with business context including data origin, ownership, usage patterns, and access to reports, AI models and data products
    Data Quality Automation
    Automated monitoring and rule management system for enterprise-wide data quality management replacing manual processes
    Privacy and Compliance Workflow
    Centralized automation of privacy workflows to operationalize privacy requirements and address global regulatory compliance

    Contract

     Info
    Standard contract
    No
    No

    Customer reviews

    Ratings and reviews

     Info
    4.2
    8 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    50%
    50%
    0%
    0%
    0%
    7 AWS reviews
    |
    1 external reviews
    External reviews are from PeerSpot .
    PrashantGupta2

    Centralized lineage and catalog have transformed how we track incidents and classify sensitive data

    Reviewed on Jun 03, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is to catalog the dataset across my company and to get the lineage of data in the my company pipeline.

    To give an example of how I use Data Hub in my day-to-day work, suppose the data is flowing from a source to Kafka and then to some data storages. If some cross-team wants to use the data but there is a problem at the Kafka level, we are not sure who all are consuming that data. Data Hub is very useful for us in this scenario. It can generate the lineage from source to destination, and when there is an issue at the Kafka side, we will get to know what the end results and impacted data sources are.

    I would add that sometimes when we do not want to share the data or when the customer or another team wants to consume the data, we are not sure what kind of data is there. We have to look at the schema. Data Hub is useful for us as we are doing the cataloging of all the datasets across my company, allowing us to later use and see the table information and schema information so that the team can identify what data is PII or non-PII.

    What is most valuable?

    The best features Data Hub offers include support for cataloging and lineage very well, as we are getting all the different types of connectors to consume and use across the my company dataset pipeline. Apart from that, the GraphQL APIs provided by Data Hub are very good, allowing us to get all the information we need programmatically whenever we need it.

    Regarding how the GraphQL APIs help my team in day-to-day tasks, we sometimes use custom logic to check whether the data has PII or non-PII. We have some AI model running on top of it, which requires classification. Based on the dataset URL, we are getting information about the dataset using the GraphQL APIs. GraphQL APIs are very handy, allowing us to customize properties and pass on the necessary information. For example, if we need a structured property, we can get those structured properties. If we need tags or owners, we can retrieve that as well.

    Data Hub positively impacts my organization by enhancing collaboration as previously, we had to ask the team to provide the schema information. my company operates in a cross-region environment, so a person in India could wait a day to receive information about the schema from someone in the US. However, with Data Hub, we have a centralized place where we can access all the schema of the datasets, making it very helpful. Additionally, whenever there is a problem, using the lineage helps us quickly identify the impacted team or dataset.

    Whenever there is an incident, we first go to Data Hub to see the downstream teams impacted and stop any jobs running on those datasets. It helps us save around eighty percent of time, as we previously had to track down information manually to find the owners, but using Data Hub, we can tag the owners of the datasets directly in the tool.

    What needs improvement?

    For improvements to Data Hub, I feel the security is a bit on the weaker side. We have ingestion jobs that require exact permissions for different owners, but this setup does not align with the my company grouping system. We need to create some custom grouping to manage those permissions. I would appreciate it if there were a method to consolidate all the information on a single page, which would simplify sharing permissions for running ingestion jobs.

    Additionally, I do feel that the metadata test we run daily takes too long. Initially, it takes one day, which I find excessive. Ideally, we should get information within one hour. These are the two main issues that would benefit from improvement for our use case.

    For how long have I used the solution?

    I have been using Data Hub for one and a half years.

    What do I think about the stability of the solution?

    Data Hub is stable in my experience. However, there are times when we attempt to upgrade it, and it may go down for a couple of minutes, but not more than that.

    What do I think about the scalability of the solution?

    Data Hub handles scalability effectively, accommodating growing data and users.

    How are customer service and support?

    I have had to reach out to Data Hub customer support multiple times. For example, when we were setting up a private link to connect to Data Hub GraphQL APIs, we required our account to be whitelisted. I have also requested some future features for our use cases. For instance, when working with a metadata test scenario, I needed to have a range date column, which was not available. I requested the Data Hub team to make it public so we could use it.

    What was our ROI?

    I have seen a return on investment with Data Hub. For instance, I have noticed time savings during incidents and while looking up schemas. In terms of resources, Data Hub centralizes data cataloging and classification, saving us from having to disclose PII column information to teams not utilizing it. Regarding financial metrics, I do not have specific metrics available.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we looked into Unity Catalog from Databricks , but we ultimately decided to stick with Data Hub.

    What other advice do I have?

    My advice for others looking into using Data Hub is to use it for cataloging, classification, and centralizing all your schema. Data Hub supports a variety of connectors and has excellent lineage options. Additionally, make sure to utilize the well-written documentation that can guide you in building your product solutions. I would rate this product a nine out of ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2847339

    Centralized data library has boosted discovery, collaboration, and time savings across teams

    Reviewed on May 30, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is that we use it as a library for all the data assets that we generate. It serves as an internal data mart where people can search for whatever data they need, and they can search by tags, by roles, and then add more metadata to it. This provides visibility to the data.

    A specific example of how my team uses Data Hub in a real-world scenario is that we collect and manipulate a bunch of data layers. Because we have huge teams, the exposure to data that we have already manipulated can sometimes be hindered when using traditional systems. Data Hub acts as a search engine for all of the data. One example would be when the marketing team was looking for specific data around marketing. They discovered that once they searched it on Data Hub, it was easily visible. They did not have to retrieve it from the raw layer and manipulate it for their usage because another team had already built it.

    Regarding how my teams interact with Data Hub, we use Data Hub with a self-hosted system. We have connectors which look into multiple data sources, manipulation engines, and orchestration layers to gather the metadata, and then that is pulled into Data Hub. This is how we get data assets in Data Hub.

    What is most valuable?

    The best features that Data Hub offers include primarily data discovery and data governance. Data Hub has data catalogs, which helps with the business glossary, ownership tracking, and lineage. Lineage is something that we are strongly using at this point in time. It helps us understand the impact analysis, such as what breaks if I change this column. Data Hub also provides data observability, helping us understand what data is fresh, what is not, and what has changed schema recently. Additionally, it makes our system AI and LLM ready.

    The lineage feature has changed the way my team works and collaborates significantly. Because we now have data lineage through Data Hub, if we have a really huge dependent pipeline with multiple layers of upstream and downstream dependency, and something breaks in the downstream system, we can exactly pinpoint what all data assets would be affected. Having that lineage functionality helps us drill down what needs to be debugged and fixed and what exact part is breaking. It saves us time in remedying the issue.

    I really like the integrations that Data Hub provides. Data Hub has a very large set of integrations that we can do with Snowflake , Databricks , BigQuery , Redshift, DBT, and Airflow .

    Data Hub has positively impacted my organization as teams can now be directly dependent on one source of truth for all their data needs. The time spent finding information has become significantly smaller, which is the real productivity improvement that I have seen, impacting multiple teams throughout the organization. I estimate that we save about thirty to forty percent of the time now since we do not have to read documents or message people for specific data assets. This results in a productivity increase of around thirty to forty percent in terms of time and efficiency.

    What needs improvement?

    I think Data Hub can be improved by supporting the open source version better. Many features have moved to the paid version now, making it difficult for small-scale companies to operate on Data Hub because we are required to pay, even though it started as an open source project that is now essentially behind a paywall.

    One needed improvement for Data Hub would be stronger AI-powered metadata discovery. I understand Data Hub has been investing in AI, but the natural language processing power on Data Hub search is not that good. The search itself is not accurate many times. Another improvement could be enhancing the DBT developer experience, such as surfacing DBT test failures directly in lineage. Additionally, when we change schema, if it could provide a risk scoring of some sort, that would also be beneficial. Lastly, automated cleanup recommendations would help because managing orphan data assets on Data Hub currently takes a lot of manual time.

    For how long have I used the solution?

    I have been using Data Hub for a year.

    What do I think about the stability of the solution?

    Data Hub is pretty stable in my experience with no downtime or issues.

    What do I think about the scalability of the solution?

    Data Hub's scalability has been effective, handling our organization's growth and data volume well.

    How are customer service and support?

    I have not had to reach out to customer support.

    Which solution did I use previously and why did I switch?

    I did not previously use a different solution before Data Hub.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing has been pleasant, and I have no complaints.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we evaluated Atlan  and decided on Data Hub because it has a cleaner UI and also a decent open source community to support it.

    What other advice do I have?

    Data Hub does most of the job it is designed to do, but there could still be improvement as the industry progresses, particularly around metadata discovery. Regarding Data Hub's AI capabilities, its governance and security do the job really well as of right now. I do not have any complaints, especially around data classification, as it allows us to have control over whatever data we are displaying, including customization for PII, sensitive, and financial data. Data Hub has met our expectations regarding its accuracy and reliability of output, and there have not been any issues.

    My advice to others looking into using Data Hub is that it is a pretty nice product right now with easy integration. The pricing model could be negotiated, so it is essential to keep that in mind. I would rate Data Hub a solid eight on a scale of one to ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Shubham-Agarwal

    Centralized lineage has reduced onboarding time and improves tracking of complex data flows

    Reviewed on May 26, 2026
    Review from a verified AWS customer

    What is our primary use case?

    Our main use case for Data Hub is for data lineage and metadata governance for our UFC project, where we are utilizing multiple databases such as SQL Server , Databricks , and Snowflake . We have adopted Data Hub to create centralized metadata for all these databases.

    A specific example of how we use Data Hub for metadata governance in our UFC project involves getting data from multiple sources including Excel files, CSV files, APIs, and external databases, storing that data first into Amazon S3  buckets, and then into Snowflake  staging areas. We transform the raw data using a DVT model, create a silver layer, and then load data into the gold layer for reporting. With Data Hub, we have a centralized view of the data flow, which makes it easier to track issues in downstream applications such as Power BI reporting.

    We also use Data Hub for onboarding new team members, as it was previously hectic to provide complete metadata details from our seven to eight data sources and over two hundred tables in our Snowflake database. Now, new team members can refer to the lineage of any table or column to understand the complete flow without relying solely on others.

    What is most valuable?

    One of the best features Data Hub offers is its ability to identify schema changes in the source side efficiently, especially when we pull data from multiple external databases such as SQL Server . It helps us quickly pinpoint necessary updates when columns are added or removed, streamlining what was previously a time-consuming manual process.

    I find Data Hub quite manageable in the downstream application within the UFC data mart, mainly when issues are reported in Power BI. It provides a complete view of the data lineage, allowing us to backtrace the source of any discrepancies easily.

    Data Hub has positively impacted our organization by reducing the knowledge transition period from three months to one month for new team members, enabling them to refer to the complete lineage without depending heavily on others, which is a substantial improvement.

    What needs improvement?

    In terms of improvements for Data Hub, it seems more useful for critical or large data pipelines, as small data architectures can be straightforward to understand without it.

    Regarding enhancements for complex projects, I have noticed that sometimes Data Hub does not provide a complete picture of the lineage, particularly in complex data pipelines such as when we fetch data from an API to S3  and subsequently to Snowflake. We have to review the metadata in Data Hub closely.

    For how long have I used the solution?

    I am working in the data engineering field for over twelve years.

    What do I think about the stability of the solution?

    Data Hub is stable in my experience.

    What do I think about the scalability of the solution?

    Data Hub's scalability is advantageous, as we onboard data from over one hundred fifty tables in SQL Server to Snowflake, and adding new tables to Data Hub is not time-consuming.

    How are customer service and support?

    Customer support for Data Hub is quite good; our infrastructure team received ample support during the initial setup within the given timelines.

    Which solution did I use previously and why did I switch?

    Previously, we used the Snowflake inbuilt lineage graph to identify data flow, but we switched to Data Hub for its centralized governance capabilities across multiple databases.

    How was the initial setup?

    The initial setup of Data Hub was completed by our infrastructure team, and I do not have complete visibility of how they made the purchase.

    What about the implementation team?

    Regarding pricing, setup cost, and licensing for Data Hub, it was handled by our client infrastructure team, so I lack visibility into those aspects.

    What was our ROI?

    I have seen a return on investment with Data Hub, notably in reducing the knowledge transition period and improving our ability to troubleshoot production issues in Power BI, thus saving time.

    Which other solutions did I evaluate?

    We did not evaluate other options before choosing Data Hub since we were solely relying on the lineage functionalities of Databricks  and Snowflake.

    What other advice do I have?

    My advice for others considering Data Hub is to utilize it, as it is free and can significantly reduce time for production support and addressing data issues, while simpler data models can benefit from the inbuilt functionalities of their respective databases. I would rate this product eight point five out of ten.

    Henrique dos Anjos

    Metadata governance has improved data lineage visibility but still needs simpler integrations

    Reviewed on Mar 31, 2026
    Review provided by PeerSpot

    What is our primary use case?

    I work with Data Hub as a user, but I also have some administrative responsibilities there. I'm not a final user; the final users are business users, and I play some administrative roles in the tool to have the metadata information available for all Uber users.

    I'm a Data Quality  Engineer focused on data governance. I manage the metadata information for Uber, and I also use this to apply some data quality rules. My focus in my current job is to apply some rules and manage the metadata information and ensure it is accurate for the end users, which is why I'm using it.

    What is most valuable?

    One of the biggest advantages of Data Hub is the very good integration, for example, a department focused on development made the integrations between Data Hub and BigQuery . When this integration is very well done, it is possible to check data lineage, which I think is a very important subject in data governance. It's something that cannot be done manually, so having a tool that shows the data lineage from the source until the target tables helps us a lot. I think this is one of the best advantages that we have.

    Data Hub helps to analyze data from various sources in my case.

    What needs improvement?

    I know that the integrations are not easy to do, and I believe it happens because it's a customized solution. There always needs to be software developers to work on this. It's complicated; every time we want to integrate new things or new sources, we need to generate a ticket or a request to another department. When I had my experience with Atlan , for example, I was able to connect different sources in a very user-friendly way. I just needed to set up some configurations and connect to the source without having to be a software developer or develop any code in the back end. It was just a feature in the data catalog that enabled me to connect with different kinds of sources. That's why I think the disadvantage of having a customized solution. Although I think Data Hub itself is a very good tool, years ago I had the opportunity to work with it, but with a clear interface and the open-source solution, which was very clear and easy to connect. At Uber, we need to have a request when we want to integrate new sources.

    Regarding Data Hub's intuitiveness, regarding analytics, I would say that some quality dimensions are available for us. For example, for each field name or each column in a table, it's possible to see the frequency, how many values we have for a specific type or category, and we can see if there are new or null values, whether the columns are empty or not, along with some metrics. This is regarding the data quality dimensions, such as nullables and things of that nature. That is all we have for features. I remember when I was working with Atlan , there was a feature I liked very much—the possibility to have a sample. When I clicked on a table, I could see a short sample without needing SQL skills. I just clicked the table and could see some values or what the table represents; the data catalog would show a screen with some rows of the table. This feature was very good, but we don't have it in Data Hub the way it is implemented at Uber. I think it would be a very good feature for analytics, and we don't have it at the moment.

    The integration part could be better, but again, it's because it's a customized solution. I think if they used the native version of the tool, it would be simpler. The integration part and the process of setting up new data quality rules would be important for data governance players like me.

    For how long have I used the solution?

    I've been using Data Hub for one year and a half.

    What do I think about the stability of the solution?

    Since I've been using Data Hub, it has always been very stable; I can say it was one hundred percent stable. I never encountered issues trying to check datasets or columns and checking their numbers. It has always worked very well in that regard.

    What do I think about the scalability of the solution?

    I think Data Hub can scale fast in its native way, but with a customized solution, it takes more time.

    How are customer service and support?

    My support is internal when I have any questions or requests, so I direct it to a support team from Uber and not from the provider. When I was working with Atlan, and needed support, they were very good at attending to my requests directly. I had contact with the provider, so it was very fast. At the moment, I don't have that; I direct my requests to an internal department of Uber.

    Which solution did I use previously and why did I switch?

    I'm not using Atlan anymore because the company that I was working with, I'm no longer there. I went to another consultancy group and now I'm working with other platforms. Atlan is not the one that I'm working with at the moment.

    I am working with a different platform that is also regarding data governance and metadata management. The platform itself, the back end, is Data Hub. But the user interface is customized for this client. I'm currently working for Uber, the Uber company.

    How was the initial setup?

    Because Data Hub is a customized solution, I don't have many details about the installation and deployment process. However, when I was using Atlan, I saw that they implemented very fast. In this way, I believe both tools have an easy way to implement, but because Uber chose to have a customized solution, it became more difficult and complex. However, in their native way, I think both tools are good.

    What was our ROI?

    In terms of ROI, I would say that Atlan is better. I had a very good experience using Atlan, and I believe it's faster. Velocity  in organizations today is very important; people want to see things very fast. I believe Atlan has a better approach compared to Data Hub.

    The way Data Hub is implemented at the moment, Atlan is much better. It's much, much faster.

    Which other solutions did I evaluate?

    I worked with Databricks , but I'm not sure if it is from Amazon; I don't think so. I think Databricks  is from Microsoft.

    What other advice do I have?

    I have experience with Data Hub to some extent.

    I believe Data Hub uses a lot of APIs, but I don't think I'm the right person to answer that because it relies a lot on a technical aspect that I don't understand. I cannot provide you with a curated answer about it, but I know that the software development team that works with this customized solution uses APIs; I just don't know how to speak about their performance, whether it's good or not.

    Real-time batch processing is very important for me and my organization because some datasets are very critical for the business. If we have batch processing, it's good for the organization to set up a very large dataset, for example, and have it available on the data catalog in a short time. I agree that this is important.

    In both experiences I had, the integration with the catalog was with GCP . I don't have experience working with another data warehouse, so even in Atlan or now in Data Hub, it is connected with GCP .

    I don't use anything else like CRM , storage, or any architecture management tools; just Data Hub.

    I would give Data Hub a score of seven out of ten, summarizing everything that I've discussed about the product.

    Azhagarasan Annadorai

    Catalog has centralized PII ownership and collaboration but still needs better automation and UI

    Reviewed on Feb 22, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is to enrich the metadata to classify for PII data. As an administrator, I crawl a number of data sources and bring the metadata into a single place, then assign the ownership, such as a data owner or steward, for all the data assets. With their help, we classify the data into PII direct and indirect, sensitive, non-sensitive, and so on. We add tags and glossary terms onto the data elements. The main use case is for DSAR compliance; for GDPR DSAR compliance, we try to identify the PII data in the catalog so that we know where the PII data is in our data inventory.

    How has it helped my organization?

    The catalog helps with metadata discovery and to find the owners/ stewards of data sources. Without a data catalog, we scramble around and speak to multiple teams, which is time consuming.

    What is most valuable?

    The best features that Data Hub offers include the management of ownership, with standard out-of-the-box ownership such as business owner, data steward, or technical owner, which is relevant for us. It also integrates with Active Directory. In our Active Directory, we maintain certain roles based on the scrum teams related to a team member, and by integrating with Active Directory, we are able to bring the same roles and map them to the corresponding ownership roles within Data Hub. Data Hub has integrations with Slack, Snowflake , BigQuery , and so on, which we use.

    Data Hub has positively impacted our organization by bringing the tribal knowledge that resides with team members into a single place where users can discover and understand the data elements before they make use of it. Users can ask questions via Slack to understand how a data element is defined and get the answers back. This definitely saves time; without a data catalog in place, users need to ask around to find out what a particular data element means and to find out the owners. Now, with the data catalog, searching and discovering data elements and the corresponding owners is easier, saving approximately thirty to forty percent of the time that would have been spent finding out the owners and definitions of the data elements.

    What needs improvement?

    Data Hub can be improved with more automation; there are some inbuilt automations, such as documenting definitions of data elements using AI, which is useful. I wonder if it can automate the classification exercise, possibly using AI to auto-classify PII direct and indirect items.

    For how long have I used the solution?

    Just started using it.

    What do I think about the stability of the solution?

    Data Hub is stable.

    What do I think about the scalability of the solution?

    Data Hub shows scalability in terms of the number of users and the number of new databases and data elements.

    How are customer service and support?

    We have not gone that far with customer support; as far as the POC is concerned, we received good support from the team and the sales team that helped us evaluate the tool.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    We previously used OvalEdge  as our data catalog before switching to find a tool that has more AI capability and allows extension of usage to non-technical users, seeking a tool that is less clunky and more intuitive.

    How was the initial setup?

    slightly techical. But there is enough documentation available.

    What about the implementation team?

    No

    What was our ROI?

    I have not yet seen a return on investment, and I do not have that information to share.

    What's my experience with pricing, setup cost, and licensing?

    Regarding experience with pricing, setup cost, and licensing, I think if we have a budget of one hundred thousand US dollars, we will be able to deploy a reasonable version and connect to a number of data sources.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we compared it with Atlan  and Alation .

    What other advice do I have?

    I chose seven out of ten because there are better catalogs available in the market that offer more features. The UI, especially when setting up new data sources and crawling them, is a little cumbersome, but it is a one-time activity, so it is manageable; however, the UI could be improved concerning administration.

    My advice to others looking into using Data Hub, also known as Acryl, is that it is a reasonably stable product that satisfies most data catalog use cases; however, Atlan  appears to be the closest competitor, while Alation  is the market leader among the three. Data Hub has an open-source version I believe, and it may be worth considering that option as well.

    I rated this review seven out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    View all reviews